Išplėstinė paieška
Pradžia>Informatika>Automatic speech recognition

Automatic speech recognition


Introduction. Speech understanding problems. Speech recognition problems. Fields of application. Conclusion.


In these days when the technologies are developing so fast, language researches take a very important part. Probably the single most challenging problem in computer science is to develop computers that can understand natural language. Imagine asking your computer "When is the next televised basketball game?" or being able to tell your PC "Please format my homework the way my English professor likes it". Despite very substantial investment in speech technology research over the last 40 years, speech synthesis and speech recognition technologies still have significant limitations. Commercial products can already do some of these things, and AI scientists expect many more in the next decade. One goal of AI work in natural language is to enable communication between people and computers without resorting to memorization of complex commands and procedures.
The aim of my work is to overlook and represent the main problems of speech understanding and speech recognition problems as well as the main fields of application.

According to Thierry Dutoit "speech can be described as the result of the coordinated action of a number of muscles" (Thierry Dutoit "Speech synthesis", 2002 p.5). Air flow has to pass thought many speech implements, such as vocal cords, trachea, and velum and so on, to produce a sound. It is convenient to group speech sounds into boards phonetic classes, related to their manner of articulation. During researching language and the way people are speaking, where there was noticed such effects called phonemes reduction, assimilation and coarticulation. (Thierry Dutoit "Speech synthesis", 2002 p.7). "Phonemes assimilation and reduction may result in the complete modification of a phonetic trait. Sounds that belong to one word can cause changes in sounds belonging to other words. Assimilation occurs when speech is rapid and casual." For example: big cat → bik kat; far away → far way (http://www.rachaelanne.co.uk/teaching/uev/uev4.doc). "Coarticulatory phenomena are due to the fact that each articulator moves continuously from the realization of one phoneme to the next" (Thierry Dutoit, Speech synthesis, 2002 p.9). "Instead two phonemes, one from the end of one word and the other from the beginning of a second word, can be joined together with the pause being before or after the joined pair and can occur in phrase such as "gas station" or "this ship". This makes it difficult for a computer to map the sound onto models and to distinguish words. (http://web.mit.edu/~flemming/www/paper/scalar.pdf)
Further phoneme sequences refer to sequences of words taken from lexicon of the related language and listed in their full form in dictionaries. It is obvious that the words is made of, although very numerous, often share of their spelling, as if they were formed from other smaller words. For example image, images, imagine, imagination and so on. This kind of word – formation is called morphology; branch of philology, dealing with forms ("The pocket Oxford dictionary of current English", fifth edition, p521). Then the words are created it is very important to make a correct sentences. The list of permissible sentences is restricted by their syntax. Although syntax drastically restricts the set of well – formed sentences, it does not constitute an exhaustive criterion for acceptability. The same sentences might be understood differently. "Most of words lose their individuality when dealt with by grammatical rules. However, traditional grammar is not particularly well – suited for a computer implementation, as they assume a prior knowledge and use of the language" (Thierry Dutoit, Speech synthesis, 2002 p.11). There are two types of ambiguities: homophones and word boundary ambiguity. "The concept homophone refers to words that sound the same, but have different orthography". For example: the tail of a dog → the tale of the dog; the sail of a boat → the sale of a boat. "When a sequence of groups of phonemes is put into a sequence of words, we sometimes encounters word boundary ambiguity. Word boundary ambiguity occurs when there are multiple ways of grouping phones into words". For example: It’s not easy to wreck a nice beach; it’s not easy to recognize speech; it’s not easy to wreck an ice beach. (www.speech.kth.se) It is easier to avoid ambiguities if semantic features as color, pattern, shape, are attached. Although homonyms such as: to, too, and two, or four, for and fore are difficult to decipher, and the software has to have a set of built-in contextual rules for establishing which word was used. Computers do not yet have the advanced capability to add contextual information to what they are ‘hearing’. Semantic grammars and parsers are still being studied by computer linguists. Only partial coverage has been achieved yet. (http://cslu.cse.ogi.edu/HLTsurvey/ch8node9.html) ...

Rašto darbo duomenys
DalykasInformatikos referatas
Apimtis8 puslapiai 
KalbaAnglų kalba
Dydis18.66 KB
Švietimo institucijaKauno Technologijos Universitetas
FakultetasHumanitarinių mokslų fakultetas
Failo pavadinimasMicrosoft Word Automatic speech recognition [speros.lt].doc
  • Referatai
  • 8 puslapiai 
  • Kauno Technologijos Universitetas / 3 Klasė/kursas
Pasidalink su draugais