立即打开
界面设计决定语音交互技术大战成败

界面设计决定语音交互技术大战成败

Shelley Evenson/Olof Schybergson 2012-11-30
过去五年中,触控互动技术大行其道,而接下来的前沿技术则可能是语音界面。借助出色的设计,人们将不再依赖于界面触控操作,而是实现人机直接对话。目前,苹果、谷歌,甚至包括微软在内,不少大公司都在发力,希望在这个领域占据优势地位,而胜出的关键或许就取决于谁能够率先设计出人性化的交互界面。
    

    最近,苹果(Apple)的语音控制功能Siri和谷歌(Google)的语音搜索功能Voice Search被许多人拿来比较。而微软(Microsoft)语音技术上实现的突破同样也登上了报纸头条。经过几十年来的不懈研发、屡败屡战后,随着语音界面出现在移动设备上,这个领域的竞争开始逐渐升温。这场较量将会决定广大市场上最终成型的语音界面的形式。

    已经有歌曲来打趣Siri的有限功能,还有网站专门分享Siri面对简单指令做出的滑稽回应。最终的优胜者决出前,我们还有很长的一段路要走。如果想要开发出这项服务,让大多数人心甘情愿地在日常生活中接纳它,那些大公司需要在人类对语音技术的挑战中赢得主动,方能取得成功。

    任何高级的语音服务除去强大的语音识别与表达能力之外,还需要具备简约迷人的界面来增强用户体验,需要情境感知能力来增加它的理解深度,需要灵敏快速的后端来持续了解用户的意图。至今为止,现有的服务没有哪项能达到这些要求。

    如果这种类型的语音助理确实存在,那么便有可能让语音交互从小众变为主流。原因如下。

“人”成为要素

    大多数人都爱说话,不过如果面对的是机器,许多人都会有些忐忑。给最外向的人一台麦克风,他们也会变得缄默。看看那些初次尝试语音服务的人,那种感觉似乎并不轻松自然。

    为什么人类不爱和机器说话?反馈(或者说缺少反馈)是一个主要原因。我们在和他人说话时,能在互动中看到许多层面的反馈——面部表情、肢体语言、语音语调等等。这种实时的反馈是人类交流的核心,在交流中,发言者和倾听者都积极地扮演着自己的角色。而在语音服务中,大多数反馈都被剥离了。

    尽管语音服务在电脑程序中很普遍,但它们仍然没有在早些时候流行起来的另一个原因在于,这类服务在电脑上并不像在移动设备上那么必要。使用电脑时,人们的双手忙于敲打键盘,使用英文键盘输入非常迅速,敲打时也可以阅读文本,从而保证录入的正确率。在这种情况下,语音输入和输出没有什么价值。智能手机则成为语音服务发展的转折点。如果能直接同设备进行互动,寻找信息或解决问题的话,人们的手就能从不断地滑动中解脱出来,干其他事。

    Recently there's been a lot of comparison between Apple's Siri and Google's Voice Search. Microsoft's voice breakthroughs have also captured headlines. After decades of research and false starts, the competition in voice interfaces is now heating up thanks to its appearance on mobile devices, and the race is on to shape the definitive voice interface for the mass market.

    But if the ballads to Siri's limitations or sites dedicated to her often-hilarious interpretations of simple instructions are any indication, we still have a long way to go until a winner is declared. To succeed, the big players will need to conquer the human challenges to voice tech if they want to design a service that most people will happily incorporate into their daily routines.

    For any advanced voice service, in addition to great voice recognition and interpretation, you need a compelling and simple interface that feels personal, context awareness that adds depth, and a very clever and fast backend that continuously learns the user's intent. No one service is the ultimate answer – yet.

    If this type of voice assistant did exist, it would have the potential to make voice interaction go from niche to mainstream. Here's why.

The human component

    Most people like to talk. But when faced with talking to machines, most of us get intimidated. Hand the biggest extrovert a microphone and they tend to clam up. Or just observe someone trying out a voice service for the first time. It simply doesn't feel (or look) easy or natural.

    So why don't people like to talk to machines? Feedback (or the lack of it) is a big reason. When talking with another person, there are rich layers of feedback throughout the interaction – facial expressions, body language, tone of voice, and more. Constant real-time feedback is central in human communication, and both speaker and listener are active participants in the communication. With voice services, most of this feedback and interaction is stripped out.

    Another reason that technical voice services failed to catch on earlier, even though they were common in computer programs, is that there's simply less need to use voice on computers compared to mobile devices. When using a computer, your hands are already committed, the QWERTY text input is pretty efficient, and seeing text as you type it also confirms it's correct. Voice input or output adds little value there. Smartphones offer the turning point for voice. When you're on the move, chances are high that your hands could do some other useful things if you can use speech to interact with your mobile in order to find things or get stuff done.

    

热读文章
热门视频
扫描二维码下载财富APP