STRUCTURE OF THE MATERIAL                      CSL

Seven volume frequency dictionary of the contemporary Serbian language

The Corpus of Serbian Language CSL consists of:
a. Annotated text in which each word is tagged for its grammatical status, number of phonemes and phonological structure.
b. Frequency dictionaries: For each sample a series of frequency dictionaries have been compiled, or are in the process of compilation at all relevant levels - from the level of a book to the level of a sample (e.g. contemporary language). Frequency dictionaries contain probabilities of word entry, grammatical forms
for a given entry, the number of graphemes, number of syllables and phonological structure for each word.
c. Probability matrices: The CSL will contain probability matrices for all grammatical forms of Serbian language, as well as for phonemes and phonemic co-occurences and syllables and syllabic co-occurences. Matrices will be given at all levels of potential analyses from  the level of a book to the level of a sample. The material will be offered in a format that is easy to transfer into any standard statistical package. 
At present the following is available: grammatically annotated text, frequency dictionaries of the contemporary Serbian language compiled from daily press and poetry, and more than 200 individual frequency dictionaries of poetical works.