Katarina

AUTHORS CSL

Katarina Ivanović: Selfportrait (1841)		Corpus of Serbian Language CSL consists of: a. grammatically annotated text, b. series of frequency dictionaries and c. set of probability tables for grammatical forms and phonemes and phonemic combinations. a. Grammatically annotated text: The text is given in its original form, each word being tagged for its grammatical status, number of phonemes, phonological structure and onomastic status. In addition, the interpunction, the beginning and the end of a sentence and paragraph are also annotated.the beginning and the end of a sentence and paragraph are also annotated.the end of a sentence and paragraph are also annotated.and paragraph are also paragraph are also annotated.the beginning and
b. Frequency dictionaries: For each sub sample a series of frequency dictionaries have been compiled. Thus, for example, for the contemporary Serbian language frequency dictionaries are available at the level of: a. book, b. author, c. sub sample (e.g. poetry or daily press) and d. sub sample as a whole (e.g. contemporary language or language between XII and XVII century). In addition to probability of an entry, frequency dictionaries also contain probabilities of grammatical forms for a given entry, number of graphemes and phonological structure for each word. c. Probability matrices: The CSLcontains probability matrices of all grammatical forms of Serbian language, as well as probability matrices of phonemes and phonemic combinations. These matrices are given at all levels of potential analyses – from the level of a book to the level of a sub sample (e.g. contemporary language). The material is given in a format that is easily transferable into any standard statistical package. At this moment the following is available: grammatically annotated text for all sub samples, frequency dictionaries at all levels for the contemporary Serbian language and probability matrices for grammatical forms and phonological structure at all levels for the contemporary Serbian language. Work on frequency dictionaries and probability matrices for other sub samples is in progress.