Many of you will know 1-800-GOOG-411. The whole point of that operation was to gather enough phonemes (units of speech) to make this possible :)
I tested it briefly on a few queries with a couple of friends and overall I think it did really well. I would definitely switch to voice over type from now on.
The queries "Bushwalking", "Dog", and other straight forward ones worked well. The term "bollocks" wasn't well received because Google returned results on politics...kinda related but not what we were after. The Geolocation didn't work when we asked for cinema times. G thought we were in Milwalkee. We're in Norwich in the UK.
How does it work? Here is a high-level summary of what's being said over at Waxy:
The sound of your voice triggers a connection to the search engine, then the chuncks of audio are sent through. It is believed that the voice is broken down into phonemes or a fingerprint of the file, so that just enough gets sent through. Feature extraction does have to occur, we're not sure how it's done right now.
Google uses the Speex OpenSource codec because it works really well with Internet applications amongst other things. The codec (compressor-decompressor) encodes the signal so that Google can understand it. The teeny file gets sent as a POST request and then Google sends an even smaller file. Once Google has the voice signal the page of results is triggered as well and a GET request for the voice-to-text string. The Voice-to-text operation doesn't take place inside the iphone, because this would mean that substantial data would have to be sent, so it's more likely this is taking place on their own servers. An array of search terms is then presented, and ready for use.
Check out the Waxy site for updates on this regularly, for example secret settings have been found.
You might also want to keep and eye on Nuance (who provided Google with the OCR software in 06) because Google enticed a fair few impressive engineers to join their ranks. Many of the others went to... Yahoo! (OneSearch technology available).
Google also encouraged the formation of the Open Handset Alliance.
If you're interested register for the Voice search Conference in San Diego, it's on the 2-4 March 2009. BUT if you're over here in the UK, you can pop along to Interspeech in Brighton on 6-10 September, you might see me there (they're hosting Loebner there this year).