然而，研究人员对这一技术的未来抱有更大的希望。思凯普•里佐是南加州大学创新技术学院（the University of Southern California's Institute for Creative Technologies）的助理总监。他正在开发一种互动仿真技术，用以帮助退伍军人针对创伤后紧张症（post-traumatic stress disorder）寻求医疗咨询服务。这款软件名为SimCoach，它的最终目的是要设法理解人们口语背后的情绪状态。里佐称：“这是个十分巨大的挑战。因为必须采集语音模式，然后得像人类的大脑那样对它们进行分析。”里佐称，人类或许能察觉自己的好友或家人情绪异常，因为这时人们的语速往往会变慢，重音也更少，但电脑要捕捉这些信号可就相当困难了。
不过这个领域的有些研究却能更快获得成果，而不用再苦苦等待。去年春天，里佐的研究伙伴——麻省理工学院（MIT）教授阿历克斯•彭特兰在美国银行（Bank of America）的呼叫中心开展了一项类似的语音推断技术试验，旨在分析员工的沟通对业务成功的影响。彭特兰让员工连续六周在脖子上戴着小型电子设备，它们能记录员工的实际位置以及身体语言和声音。所记录的数据能显示这些员工是在和谁沟通，他们站着时与沟通对象距离有多远，谈话的语调如何。彭特兰称：“我们发现，效率最高的员工不光与大量对象交谈，他们还与同样表现出这种特点的同事交谈。”结果，他说，只需要调整一下员工的茶歇时间，使这类员工之间的步调更为同步，这个呼叫中心每年就能节省1,500万美元。
How? Much like Siri, Nuance's application — which is being used by some 450,000 physicians across the country – extracts meaning from the words it recognizes, referencing a database of medical information and comparing that with the patient's history. It then uses statistical inference to establish a connection between the pieces of information it discovers, even making suggestions about treatment. Petro says the technology is more than 90% accurate and improves over time. It's certainly worked for the bottom line, so much so that Nuance decided to raise its fourth-quarter revenue projections by about $10 million.
Researchers have even bigger hopes for the future. Skip Rizzo, associate director of the University of Southern California's Institute for Creative Technologies, is working on an interactive simulation technology designed to help military veterans seek counseling for post-traumatic stress disorder. Dubbed SimCoach, the program will eventually attempt to read the emotion behind spoken words. "It's a big, big challenge. Because what you're doing is having to capture vocal patterns, then you're having to analyze them like a brain does," says Rizzo. While humans may be able to tell when something is wrong with a close friend or family member because their speech pattern is slower or has less emphasis, a computer can have a hard time picking up these signals, Rizzo says.
Some research could bring results sooner, rather then later. Last spring, Rizzo's research partner, MIT Professor Alex Pentland, experimented with a similar voice inference technology at a Bank of America (BAC) call center, analyzing how employee communication affected the success of the business. Pentland had employees wear small electronic badges around their necks for six weeks that tracked their physical location and well as body language and voice. The data showed who a person interacted with, how close they were standing to them and the tone of their conversation. "We found that the most productive people were the people that not only talked to lots of people but they talked to co-workers that similarly talked to a lot of people," Pentland says. Simply by changing the employee's coffee break schedule to better coincide with one another, he says the call center would be able to save $15 million a year.
The attention consumers are paying to Siri is likely to benefit such research — and push adoption further. "Voice recognition is really the holy grail to technology," Rizzo says. "We're 90% there, but that last 10% is a lot further to handle. And when the tipping point is reached, it's going to be a giant market." It looks like Siri, may very well be that tipping point.