Inside view of the speech synthesizer constructed
in 1957 at the Institute for Experimental Phonetics and Speech Pathology,
Belgrade
|
|
In the middle
of 1950s, the project for machine translation was commenced at the Institute
for Experimental Phonetics and Speech Pathology in Belgrade. The project
included automatization of several elements of speech communication - speech
recognition, automatic recording of speech into textual format, a language
corpus, translation from one language into another and speech synthesis.
Professor Kostić, who directed this project was of the opinion that complete
automatic translation from one language into the other was not possible
because language could not be totally formalized. Further, due contextual
and extralinguistic factors, the machine would have to introduce real world
knowledge in order to interpret the context correctly. For these reasons,
the problem of automatic speech recognition was constructed on a probabilistic
basis and formalizations were maximized. With regards to automatic speech
recognition, in |
addition
to knowledge about distinctive properties of phonemes, the computer memory
would need knowledge of probabilities of phonemic co-occurrences. Automatic
text recognition relied on the probability of grammatical forms, formalized
morphology and syntax and lexemes and their grammatical forms.
All of these considerations lead to the conclusion
that the problem of machine translation was limited to a rough rendition,
with the idea of improving the possibilities of automaticity by further
research and technological advancement.
A group of experts of different profiles was
formed by contacting several institutions, such as the Faculty of Electrical
Engineering, the Federal Bureau of Statistics and the Institute for Serbian
Language at the Academy of Sciences and in 1954 the project was inaugurated.
The acoustic structure of all phonemes in Serbian language was spectrographically
described in detail. Machine for speech recognition was constructed and
connected to a phonetic typewriter. This system was capable of recognizing
and reproducing all Serbian vowels and a few consonants. If the speech
sounds were pronounced in isolation and clearly, the machine was able to
recognize them without any mistake, while in everyday speech the error
rate was about 30%. A speech synthesizer that could produce all Serbian
vowels, a few consonants and make several sentences was also developed.
Today, it is very difficult to reconstruct the
thinking behind which the architecture of the machine translation was conceived.
It is certain however, that the problem was approached from two directions
– from linguistics and engineering science. The technological and engineering
requirements were left to a team of engineers under the management of Prof.
Rajko Tomović who, at that time, was one of the leading world experts in
the field of computer science. Linguistic considerations proceeded by the
formation of a grammatically annotated corpus of Serbian language that
was to be the language basis for the machine. Besides word entry probabilities,
the corpus would allow approximation of probability of all grammatical
forms and probability of all grammatical forms for each word.
Although the project was conceived as an interactive
conjunction of several parts, each presenting a problem in its own respect,
the essence of the whole project was the corpus. Alongside syntax, which
was to be specified in the second phase of the project, the corpus presented
machine knowledge about the language. Parallel to this work on the corpus
of the Serbian language, texts in English, German and French were grammatically
annotated as well, with the aim of making pilot corpora that would serve
as the material for evaluation of the system of machine translation as
a whole.
The project was divided into two chronological
phases. The first phase assumed the acoustic analysis of the phonemes of
Serbian language and the specification of the probabilities of phonemes
and phoneme combinations, including parallel work on the speech analyzer,
phonetic typewriter and speech synthesizer. The main part of the first
phase was the formation of the corpus and specification of probabilities
of words and grammatical forms. The second phase, which commenced at the
end of 1950s, comprised a description of the syntax of Serbian language
that would be partly formalized and partly expressed in terms of probability.
At the beginning of 1960s, the project terminated
for reasons that were not scientific. |