Apple may not be too far behind in the race for better contextual awareness. Their recent patent applications show a clear intent to make Siri more integrated into the living room via Apple TV, and into our photos via iPhoto. These kinds of connection could make Siri more useful and ubiquitous.
However, the above is not quite reality yet, and compared to Google, Siri currently has an Achilles heel. Siri's relatively weak backend fulfillment and accuracy often makes her appear less like the savvy and smart woman you would want her to be, and more like your three-year old child that's only just mastered basic language skills and knows little about the world.
Microsoft has long been a voice pioneer, and they are still actively driving progress in the domain. In a stunning demonstration of assistance in context, Microsoft's Chief Research Officer Rick Rashid recently presented the company's latest voice breakthrough by having the technology translate his spoken English into Chinese in near real-time and in his own voice. If deployed at scale, imagine how much international business would be transformed if we could use Microsoft technology to effortlessly speak in any language. It's the nightmare of professional translators and proud polyglots, but the dream of many travellers.
While Microsoft was an early player in mobile voice, they seem to have been unable to effectively capitalize on their capabilities. They have lots of voice experience and technology, and through their Bing investments they can compete with Google's backend accuracy. What's missing in their voice services is a clear embodiment of the technology that makes this relatively foreign and abstract phenomenon understandable and approachable for normal people. If Microsoft had invested a small portion of their voice R&D spend in designing a delightful human interface for the service, they would now be a real contender for defining the dominant voice interface.
Personalization, context, and intent
The approaches that Apple, Google and Microsoft have taken in voice make the challenges of the interaction abundantly clear.
With voice, the potential for intuitive interactions grows exponentially as the system knows more about you. With more data, it can effectively anticipate your context and intent. Looking ahead, we'll begin to see exciting new services that listen in the background to deliver genie-like wishes for everything from books casually mentioned to the address for a restaurant in San Francisco that reminds us of the one we loved in New York and prompt you accordingly for orders or reservations.
We can imagine services that enable personalized simultaneous translation in global negotiations, or voice intelligence that replaces pre-recorded messaging with real-time contextual information delivered with your voice–for example, your spouse hearing why you're unavailable at the moment.
The more we allow these systems the permission to "listen-in" the better they will be at offering the responses we need.