There are a couple of different routes to access Google’s multi-lingual voice recognition function, including Google Now on your Android phone, searching on a browser, and Google Maps to name a few. It’s generally the case that the environment you’re in at the time determines the software you use, meaning the quality of speech that’s captured from you may vary wildly.
In the early days of speech recognition, I remember being seated in front of my PC, with a headset on, in a super-quiet room, carefully pronouncing paragraphs of text to allow some Dragon software to “learn” the intricacies of my pronunciation. But even after significant training in what would be a perfect environment, I ended up with much less than perfect results!
This morning, I’ve been stood in the middle of Euston train station – certainly not the quietest of locations, with the general bustle of the public, tannoy announcements, and beeping electric vehicles all contributing to a cacophony of noise which means speech recognition needs to work much harder to pick out my voice against that of the background. It also needs to then determine the words, identify what I’m requesting, and return the result. Ideally this all needs to happen within a few seconds for it to be useful.
I tried a number of voice requests today, such as:
- OK Google, remind me to order a prescription at when I get home
- OK Google, whats the weather going to be like tomorrow
- OK G0ogle, set an alarm for 8am tomorrow
With each request came success. Whilst this is not the most scientific of tests, it’s becoming one of those technologies that “just works”. When things “just work” you start to build a dependency which ultimately reinforces its value – not a bad thing in this case.
I continue to be impressed with the accuracy of the recognition, and Google now claim that they process requests with 92% accuracy – I find that to be an extremely good figure.
Finally, some might say that these advancements come at a cost of personal privacy – our speech being captured, and the translations being stored for analysis. The arguments for and against are really for a whole different article than this one! But, I did decide to dip into the My Account feature that Google provide and see how much of my voice content they capture, and why. I was presented with a simple list of my voice searches, a link to play back the audio captured, plus a translation of what the analysis thought I’d said. There’s an option to delete individual searches, or group them by day, or even all of them.
On the Google Search Help pages they provide some insight into why they store this data:
To help you get better results using your voice, Google uses your Voice & Audio Activity to:
- Learn the sound of your voice
- Learn how you pronounce words and phrases
- Recognize when you say “Ok Google”
- Improve speech recognition across Google products that use your voice
To be honest, what I saw being held really doesn’t worry me that much – if it helps make my experience better, I can’t really complain about what is being stored, as long as I’m aware of it, and I have some control over it.
In terms of the future, I can only expect that 92% accuracy figure to increase over time. The question is, will it ever reach 100%? As humans, we’re actually prone to misinterpret speech from time to time, so 100% is unlikely to be reached, but if it worked at human equivalent levels, I’d be more than happy.
- A list of all the Google Now voice commands
- Another less than scientific Google Voice Search test
- Your Google voice and audio activity