0条Plus

界面设计决定语音交互技术大战成败

Shelley Evenson/Olof Schybergson 2012-11-30

过去五年中，触控互动技术大行其道，而接下来的前沿技术则可能是语音界面。借助出色的设计，人们将不再依赖于界面触控操作，而是实现人机直接对话。目前，苹果、谷歌，甚至包括微软在内，不少大公司都在发力，希望在这个领域占据优势地位，而胜出的关键或许就取决于谁能够率先设计出人性化的交互界面。

不过，或许苹果在这场提高情境感知的竞赛中并不会落后。他们最近的专利应用体现出了明显的意图，想要通过Apple TV让Siri成为居室生活中的一部分，还通过iPhoto将其与我们的相片联系起来。这一系列整合让Siri变得更加有用和普遍。

然而，以上提到的设想尚未成为现实。与谷歌的产品相比，Siri目前拥有一个致命的弱点。它的后台实现功能和准确性相对较弱，使它看起来不像预想中那个聪明机智的女性，而更像一个刚掌握了一些语言基础、不谙世事的三岁小孩。

微软长期以来都是语音技术的先行者，他们仍然在该领域积极探索。在最近一次语境协助的精彩演示中，微软的研究总监里克•雷斯特展现了公司在语音领域的突破性进展。利用最新技术，机器将他说出的英语用他自己的声音几乎实时翻译成了中文。如果这项技术能大量利用，我们可以毫无困难地使用任何语言，可以想象，届时国际商务将会发生多么巨大的变化。它将是专业翻译和通晓多国语言者的噩梦，却是许多旅行者的梦想。

微软很早就开始在移动设备语音的领域一显身手，却似乎无法有效地利用自己的能力。他们有许多关于语音的经验和技术，通过在必应（Bing）上的投入，他们可以与谷歌的后台在准确率上一较高下。他们在语音服务上缺失的东西是，利用技术使这个相对外来和抽象的事物能够让普通人理解和使用。如果微软能够在他们对语音服务的研究与开发的投入上分出很小一部分，来设计令人喜爱的人性化服务界面，他们将会是决定统治地位的语音界面的真正竞争者。

个性化、情境和意图

苹果、谷歌和微软在语音领域的努力使得这场关于交互挑战的形势昭然若揭。

在使用语音的情况下，系统只要能更了解用户，直观互动的潜力就会以指数方式增长。掌握更多数据后，它就能有效地预料人们的情境和意图。展望一下，我们可以看见更令人振奋的新式服务，它会在背后关注着我们的各种愿望，从偶然提起的书籍，到令我们想起纽约旧爱的一家旧金山旅馆的地址，然后提醒我们去购买或预订。

我们可以设想，这项服务能在国际磋商中提供同声传译，或是取代消息的提前录制，使用语音智能将实时的文字信息用你的声音发出去——比如，让你的爱人听到你暂时无法与她联系的理由。

只要让这些系统更好地了解我们的需要，它们就能更好地针对我们的需求做出回应。

Apple may not be too far behind in the race for better contextual awareness. Their recent patent applications show a clear intent to make Siri more integrated into the living room via Apple TV, and into our photos via iPhoto. These kinds of connection could make Siri more useful and ubiquitous.

However, the above is not quite reality yet, and compared to Google, Siri currently has an Achilles heel. Siri's relatively weak backend fulfillment and accuracy often makes her appear less like the savvy and smart woman you would want her to be, and more like your three-year old child that's only just mastered basic language skills and knows little about the world.

Microsoft has long been a voice pioneer, and they are still actively driving progress in the domain. In a stunning demonstration of assistance in context, Microsoft's Chief Research Officer Rick Rashid recently presented the company's latest voice breakthrough by having the technology translate his spoken English into Chinese in near real-time and in his own voice. If deployed at scale, imagine how much international business would be transformed if we could use Microsoft technology to effortlessly speak in any language. It's the nightmare of professional translators and proud polyglots, but the dream of many travellers.

While Microsoft was an early player in mobile voice, they seem to have been unable to effectively capitalize on their capabilities. They have lots of voice experience and technology, and through their Bing investments they can compete with Google's backend accuracy. What's missing in their voice services is a clear embodiment of the technology that makes this relatively foreign and abstract phenomenon understandable and approachable for normal people. If Microsoft had invested a small portion of their voice R&D spend in designing a delightful human interface for the service, they would now be a real contender for defining the dominant voice interface.

Personalization, context, and intent

The approaches that Apple, Google and Microsoft have taken in voice make the challenges of the interaction abundantly clear.

With voice, the potential for intuitive interactions grows exponentially as the system knows more about you. With more data, it can effectively anticipate your context and intent. Looking ahead, we'll begin to see exciting new services that listen in the background to deliver genie-like wishes for everything from books casually mentioned to the address for a restaurant in San Francisco that reminds us of the one we loved in New York and prompt you accordingly for orders or reservations.

We can imagine services that enable personalized simultaneous translation in global negotiations, or voice intelligence that replaces pre-recorded messaging with real-time contextual information delivered with your voice–for example, your spouse hearing why you're unavailable at the moment.

The more we allow these systems the permission to "listen-in" the better they will be at offering the responses we need.

精选评论

撰写或查看更多评论, 请打开财富Plus APP

热读文章

热门视频

500强行业分布