|STRUCTURE OF THE MATERIAL
|The Corpus of Serbian
a. Annotated text in which each word is tagged for its grammatical status, number of phonemes and phonological structure.
b. Frequency dictionaries: For each sample a series of frequency dictionaries have been compiled, or are in the process of compilation at all relevant levels - from the level of a book to the level of a sample (e.g. contemporary language). Frequency dictionaries contain probabilities of word entry, grammatical forms
|for a given entry, the
number of graphemes, number of syllables and phonological structure for
c. Probability matrices: The CSL will contain probability matrices for all grammatical forms of Serbian language, as well as for phonemes and phonemic co-occurences and syllables and syllabic co-occurences. Matrices will be given at all levels of potential analyses – from the level of a book to the level of a sample. The material will be offered in a format that is easy to transfer into any standard statistical package.
At present the following is available: grammatically annotated text, frequency dictionaries of the contemporary Serbian language compiled from daily press and poetry, and more than 200 individual frequency dictionaries of poetical works.