|HOW IT WAS MADE
Text manually annotated for the grammatical status of each word
(detail from the original A4 sheet divided into 16 frames)
Due to technological limitations in the late fifties, most of the work on the ŃSL was executed manually. The final goal was to compile a number of frequency dictionaries that would serve as a basis for automatic speech and text recognition and machine translation. Compilation of frequency dictionary consisted of 27 distinct operations. Here we outline the most important ones.
a. Within each book that was included in a sample, lines were tagged on each page.
b. A4 sheet of paper was divided into 16 frames and within each frame a word from a book was transcribed. For a given word the page number and line number where a word appeared were tagged as well.
c. Once the whole text was transcribed and lines and pages recorded, each word was specified for its grammatical status (see the picture).
d. Grammatical tagging was subsequently monitored by group of linguistic experts that randomly sampled about 10% of a text. In cases where there was more than a 2% error rate, the grammatical tagging was repeated until the required criterion was reached. Sometimes the procedure had to be executed 3 or 4 times to reach the required standards of reliability.
e. Once grammatical tagging was complete, the A4 sheet was cut into 16 frames – one frame for each word. Word frames were then sorted into alphabetical order.
f. Different grammatical forms (i.e. frames) for each word were sorted according to a specified order (e.g. for the word HOUSE, for example, all nominatives singular were put together, then all genitives singular etc.).
g. Reliability for grammatical code sorting was monitored.
h. For each word, the frequencies for each grammatical form were counted, as was the total the number of occurrences of a word entry (e.g. HOUSE appeared 15 times in the nominative singular, 5 times in the genitive singular etc. The word HOUSE, irrespective of its grammatical form (i.e. word entry) appeared 75 times.).
i. Counting of grammatical forms and word entries was also controlled for reliability.
j. The obtained frequency counts were transcribed into a specialized forms and than typed on an A4 sheet. In its final form each frequency dictionary had two distinct versions: one with word entries and grammatical forms for each word being sorted by alphabetical order and the other, with word entries sorted by rank frequency.
k. By the time when the project had to be suspended, more than 27 000 pages of various frequency dictionaries have been compiled.