STRUCTURE OF THE MATERIAL
CSL
|
Seven volume frequency dictionary of the contemporary Serbian language |
The Corpus of Serbian
Language CSL
consists of:
a. Annotated text in which each word is tagged for its grammatical status, number of phonemes and phonological structure. b. Frequency dictionaries: For each sample a series of frequency dictionaries have been compiled, or are in the process of compilation at all relevant levels - from the level of a book to the level of a sample (e.g. contemporary language). Frequency dictionaries contain probabilities of word entry, grammatical forms |
|
for a given entry, the
number of graphemes, number of syllables and phonological structure for
each word.
c. Probability matrices: The CSL will contain probability matrices for all grammatical forms of Serbian language, as well as for phonemes and phonemic co-occurences and syllables and syllabic co-occurences. Matrices will be given at all levels of potential analyses – from the level of a book to the level of a sample. The material will be offered in a format that is easy to transfer into any standard statistical package. At present the following is available: grammatically annotated text, frequency dictionaries of the contemporary Serbian language compiled from daily press and poetry, and more than 200 individual frequency dictionaries of poetical works. |
||
|
||