HOW DOES EVE RECOGNIZE LANGUAGE?
Artificial intelligence does a lot of work for us today. Algorithms recognize language, faces and are learning every day – without eyes or ears.
HOW DOES THIS WORK?
EVE analyses every single word. First, the algorithm optimizes the signal, and tries to filter out interfering noises, and optimizes the volume of parts that are too quiet or too loud. After that a waveform is generated, which represents the audio signal as an image.
EVE compares this image with a huge database of audio signals and searches for similar entries. Each hit is evaluated, but at the end EVE chooses the most likely one. However, EVE does not add data to the database, because of data protection reasons.
The database is not the same size for every language. As there are more completed training hours for the English database than the Italian one, EVEs recognition is stronger in some languages than in others. We can measure this accuracy in extensive tests. We translate a test text into the desired language, then a professional speaker speaks the text at normal speed and EVE listens. After that, wecorrect the result and compare it with the source text. Every mistake in spelling, every wrong word and every wrong punctuation are counting as errors. From this number of mistakes we calculate the error rate for the corresponding language. However, as EVE is constantly learning new things, we must repeat the tests often and are happy to share the current results on request.
In addition to the constantly growing database, EVE has also other learning opportunities. There are three features for each language: Acoustic, language and pronunciation model. However, not all tools are available in every language. A complete list can be found here.
Akustik model (Acoustic adaption)
EVE accepts MP3 and text files to better separate speech from background noise. This makes sense for event locations with difficult acoustics, but a test run in advance is necessary to feed EVE with realistic data. EVE then uses the two files to calculate the noise and recognizes the language for this setup much better.
Language Model (Language adaption)
This refers to a dictionary. It is sufficient to type in the desired technical words or surnames. EVE trains these words at the touch of a button. This takes up to 30 minutes, but then she reliably recognizes these words.
This allows EVE to learn new words by phonetic transcription. If, for example, EVE has to recognize “R2D2”, it is written once normally and once in phonetic transcription, i.e. “ahr tuo di tuh“. EVE recognizes words with “special” pronunciation and even dialects.