Thanks to new AI research, Microsoft has announced that their speech recognition system is as accurate as humans in detecting sounds and words. The newly improved system can lead to better accessibility features for Cortana, Microsoft’s virtual assistant.
Microsoft’s Artificial Intelligence and Research team which includes Ai researchers as well as engineers are the ones behind the new speech recognition system. They have established a new record in the recognition of speech by reducing the word error rate from 6.3 percent, reported last month, to only 5.9 percent. This level of accuracy is on par with that capacity of humans to detect speech.
The word error rate is the scale used to determine the quality of speech recognition system. It consists of the number of times that a system such as an artificial intelligence incorrectly mishears different words in a conversation. According to the results claimed by Microsoft, the AI system and humans were evenly matched.
The researchers compared the AI systems not with everyday people, but with professional transcribers. Both the AI and the transcribers were instructed to listen to the same two conversations and write exactly what they are hearing. One was a two-way discussion and the another one included open-ended conversations between friends and family.
For the first conversation, both the transcribers and Microsoft’s AI managed to score 5.9 percent. However, in the second conversation, although both got the same score again, this time the word error rate was 11.3 percent. According to the article published by the Microsoft researchers, both the AI and humans had almost the same mistake, with an exception. The AI was confused by the common ‘uh-huh’ sounds a person makes when nodding and the ‘uh’ sound’ which denotes hesitation in speech. The AI had problems with these sound because although they sound almost the same they do have different subtle meanings which were easily identified by the transcribers.
The capacity of AI to achieve parity with humans in speech recognition is an impressive if not historic achievement. However, the researchers do admit the AI system still require more work if they are going to repeat the feat in real-life situations where there is a lot of background noise. The AI also has problems identifying multiple people with different accents and ages.
Image source: Wikipedia