0条Plus

《财富》专访百度首席科学家：计算机不会统治世界

Derrick Harris 2016年02月28日

从语音控制到图像搜索，再到无人驾驶汽车，人工智能最近很火爆，也的确值得关注。但感知与现实之间往往存在差异。百度首席科学家吴恩达接受《财富》专访，解释了为什么人工智能变得如此热门，公司如何利用人工智能赚钱，以及人们对人工智能大灾变的担忧为什么是不切实际的。

百度首席科学家吴恩达领导着该公司人工智能的研发工作。

如今，人工智能似乎无处不在，它被用于智能手机上的语音控制和图像搜索，并提供了无人驾驶汽车和机器人的大脑。在这一领域，最受关注的是美国谷歌、Facebook和微软公司，但中国的搜索巨头百度同样是一支重要的力量。

百度人工智能的研发公司由该公司首席科学家吴恩达领衔，他之前在斯坦福大学教授机器学习，参与创建在线课程平台Coursera和谷歌大脑项目。2012年，谷歌大脑项目证明，计算机可以通过自我学习识别出YouTube视频中的猫，由此激发了大众对深度学习这一人工智能领域的兴趣。吴恩达在加州森尼维尔创建了百度人工智能实验室。在此次采访中，他解释了为什么人工智能现在变得如此热门，公司如何利用人工智能赚钱，以及为什么人们对未来的人工智能大灾变的担忧是不实际的。

以下是本次独家专访的主要内容：

《财富》杂志：你如何定义人工智能，至少就商业上可行的人工智能应用而言？

吴恩达：过去几年，我们看到，计算机在吸收数据进行预测方面做得越来越好。其中包括预测用户最有可能点击哪类广告，识别图片中的人，预测网页中哪些内容与你的搜索关键词最相关——这样的例子不胜枚举。这些应用带来了更好的用户体验，而且为一些公司带来了更多收入。

《财富》：是不是可以说，主流人工智能更多的是识别数据模式，而不是创建能像人一样思考的计算机？

吴恩达：尽管人工智能非常火爆，但我认为，它们的发展程度远远低于人们的想象。目前的人工智能所创造的价值均来自一类技术——监督学习。所谓监督学习是指根据系统已经看到的其他输入示例，预测结果或对事件进行分类，例如“给出一张图片，找出其中的人”。或者“给出一个网页，预测用户是否会点击这个网页。”或者“给出一封电子邮件，确定这是不是垃圾邮件。”

语音识别是另外一个例子，其中输入的是音频片段，输出的是说话内容的文本。

《财富》：苹果新发布的Siri功能，使语音识别技术成为媒体关注的焦点。未来可以采取哪些措施使助手类应用变得更加有用？

吴恩达：我们的期望是，努力使与计算机交谈变得像与真人交谈一样自然。这是一个遥远的目标，短期内不会实现，但一旦我们能够实现这个目标，就会有更多用户使用它。今天，使用语音功能的主要是科技发烧友。大多数人在与计算机互动的过程中并不会使用语音。

与一台机器交谈的感觉，和与真人交流的感觉仍有明显差异：你只能说某些事情，你不能打断计算机。有时候，等待机器做出回应需要较长的时间。有时候你说出某些内容，机器无法理解。举一个典型的例子：比如我对着一台电脑说：“请呼叫卡洛儿555-1000……不，等一下，是1005，”计算机能够准确理解这些话，并完成正确的操作吗？

《财富》：几年前，似乎很少有面向消费者的人工智能，但如今，语音识别和能够识别图片的算法等技术似乎变得非常普遍，这期间到底发生了哪些变化？

吴恩达：计算能力的提高和数据的增多，推动机器学习领域取得了很大的进步，尽管这种观点在学术界并不受欢迎。以造火箭来打个比方：你需要一台巨大的火箭引擎，你还要有足够的燃料。如果你的火箭引擎太小，却有大量的燃料，你的火箭可能无法起飞。如果你有一台巨大的火箭引擎但燃料较少，你可能无法让火箭进入轨道。

只有一台巨大的引擎和足够的燃料，才能让火箭到达有趣的地方。在这个比喻中，火箭引擎便是大型计算机——在百度，也就是我们正在建造的超级计算机——而火箭燃料便是我们拥有的大量数据。

过去十年间，数据的积累或者说火箭燃料的增加，超出了我们建造火箭引擎吸收这些燃料的能力。但现在，我们有能力增大我们的火箭引擎，甚至已经超越了提供火箭燃料的能力。你必须努力增大同步提高这两方面的能力。

似乎每一次将深度学习应用到一项任务当中，都会产生最佳的结果。我们能否将其应用于公司的销售数据，从而比传统的企业软件或流行的“大数据”工具更快生成有意义的见解？

深度学习所面临的一个重要限制是，其创造的几乎所有价值都在输入-输出映射当中。如果在企业数据中，X代表亚马逊的一个用户账号，Y代表“他们是否曾进行购物？”而且你有大量X-Y配对的数据，那么你就可以采用深度学习。但我想说的是，在自行检索数据和发现价值方面，这类算法仍处在起步阶段。

这也是为什么我认为人工智能将催生杀手机器人和超级智能属于过分炒作。这种X-Y类映射是一种非常狭隘的学习方式。人类的学习方式要更加丰富。用术语来说，这种方式叫监督学习，我认为到目前为止，我们还没有找到其他学习类型的正确思路。

根据数据去探索世界的无监督学习便是其他学习类型之一。人类在这方面似乎很有天分。计算机虽然有令人不可思议的基础算法，可以进行一定程度的无监督学习，但远远达不到人脑的水平。

Artificial intelligence is seemingly everywhere today, doing everything from powering voice controls and image search on our smartphones to supplying the brains for autonomous cars and robots. And while research on the topic from Google GOOG -0.05% , Facebook FB 1.16% and Microsoft MSFT -0.07% gets most of the attention in the United States, Chinese search giant Baidu BIDU -2.52% is also a major player in the space.

Its efforts are led by Andrew Ng, the company’s chief scientist who previously taught machine learning at Stanford, co-founded Coursera and helped create the Google Brain project. In 2012, Google Brain kicked off mainstream interest in a field of AI known as deep learning by showing how computers can teach themselves to recognize cats in YouTube videos. In this interview, Ng, who works out of Baidu’s artificial intelligence lab in Sunnyvale, Calif., explains why artificial intelligence is so hot right now, how companies are using it to make real money, and why concerns over an impending AI apocalypse are probably overblown.

The following was edited for length and clarity:

Fortune: How would you define artificial intelligence, as least as it applies to commercially viable approaches?

Andrew Ng: What we’re seeing in the last several years is that computers are getting much better at soaking up data to make predictions. These include predicting what ad a user is most likely to click on, recognizing people in pictures, predicting what’s the web page most relevant to your search query — and hundreds of other of examples like that. Many of these are making digital experiences much better for users and, in some cases, increasing the bottom line for companies.

Is it fair to say then that mainstream AI is more about recognizing patterns in data than it is about building computers that think like humans?

Despite all the hype, I think they are much further off than some people think. Almost all the economic value created by AI today is with one type of technology, called supervised learning. What that means is learning to predict outcomes or classify things based on other example inputs the system has already seen, such as “Given a picture, find the people in it.” Or, “Given a web page, find whether user is going to click on this web page.” Or, “Given an email, determine if this a piece of spam or not.”

Speech recognition is another example of this, where the input is an audio clip and the output is a transcription of what was said.

Speech recognition has been in the news lately because of some new Apple Siri features. What are the next steps to make assistant-type apps more useful?

The vision we’re pursuing is trying to make talking to a computer feel as natural as talking to a person. That’s a distant goal, we won’t get there anytime soon, but as we reach that point, more users will use it. Today, speech is largely used by technology enthusiasts. Most people around the world are not using speech for their interactions with computers.

Talking to a machine still feels very different from talking to a person: you’re allowed to say certain things, you can’t interrupt a computer. Sometimes it takes a little longer to respond. Sometimes you say something and it’s very confused. Here’s one specific example: If I’m talking to a computer and I say, “Please call Carol at 555-1000 … no, wait, 1005,” can a computer interpret that correctly and take the right action?

What happened in the past few years to take us from seemingly little consumer-facing AI to today, where things like speech recognition and algorithms that can understand photos seem commonplace?

A lot of the progress in machine learning—and this is an unpopular opinion in academia—is driven by an increase in both computing power and data. An analogy is to building a space rocket: You need a huge rocket engine, and you need a lot of fuel. If you have a smaller rocket engine and a lot of fuel, your rocket’s probably not going to get off the ground. If you have a huge rocket engine but a tiny amount of fuel, you probably won’t make it to orbit.

It’s only with a huge engine and a lot of fuel that you can go interesting places. The analogy is that the rocket engines are the large computers—in Baidu’s case, supercomputers we can now build—and the rocket fuel is the huge amounts of data we now have.

Over the last decade, the rise of data, or the rise of rocket fuel, got a little ahead of our ability to build rocket engines to absorb that fuel. But now our ability to scale up our rocket engines has been catching up, and in some cases surpassing our ability to supply the rocket fuel. You have to work hard to scale them at the same time.

It seems like every time deep learning is applied to a task, it produces the best-ever results for that task. Could we apply this to, say, corporate sales data and identify meaningful insights faster than with traditional enterprise software or popular “big data” tools?

The big limitation of deep learning is that almost all the value it’s creating is in these input-output mappings. If you have corporate data where X is maybe a user account in Amazon, and Y is “Did they buy something?” and you have a lot of data with X-Y pairs, then you could do it. But in terms of going through data and discovering things by itself, that type of algorithm is very much in its infancy, I would say.

This is also one of the reasons why the AI evil killer robots and super-intelligence hype is overblown. This X-Y type of mapping is such a narrow form of learning. Humans learn in so many more ways than this. The technical term for this is supervised learning, and I think we just haven’t figured out the right ideas yet for the other types of learning.

Unsupervised learning is one of those other types, where you just look at data and discover stuff about the world. Humans seem to be amazing at that. Computers have incredibly rudimentary algorithms that try to do some of that, but are clearly nowhere near what any human brain can do.

在通过Skype接受采访期间，吴恩达解释了什么是X-Y配对。

《财富》：谷歌和Facebook在美国获得了极大的关注，请告诉我们百度正在进行哪些由人工智能驱动的工作？

吴恩达：百度之前进行的一项工作是创建内部的深度学习平台。我们所作的是让全公司的工程师，包括非人工智能研究人员，以各种创造性的方式使用深度学习——许多方式是我和其他人工智能研究人员不可能想到的。以深度学习驱动的创造性产品有很多，不仅仅限于我们的网页搜索、图片搜索和广告等核心业务。

比如：我们的计算机安全产品使用深度学习来识别威胁。我本来不可能想到这一点，也不可能知道如何实现。我们使用深度学习来尝试提前预测硬盘会在哪一天出现故障，而这提高了我们的数据中心的可靠性，降低了成本。

《财富》：百度也创造出一项与谷歌眼镜类似的技术，一个数字助手，甚至还有一款智能自行车。这些产品有市场吗？或者目前只是有趣的实验品？

吴恩达：我认为它们目前仍处在研究探索阶段。不过根据社区反馈，这些产品肯定会有需求，比如智能自行车和可穿戴摄像机。

实际上，我们之前在中国演示了一款名叫Dulife的新产品，这款产品采用计算机视觉和自然语言处理，告诉盲人前方有什么。例如，在中国，多种不同面值的钞票尺寸相同，盲人必须通过触摸来确定它们的区别。但一张钞票在流通一段时间之后，触摸部分会被磨损，盲人便很难确定钞票的面值。在这个应用案例中，简单的计算机视觉可以告诉你，你手中的钞票是20元还是50元。这是盲人迫切需要的一项应用。

《财富》：作为百度的主要市场，在移动或可穿戴设备方面，中国市场与美国市场或其他市场有何区别？

吴恩达：中国市场迥然不同。差异之一是，中国目前最大、最热门的科技潮流是O2O或线上线下电子商务。

O2O的概念是指，利用移动设备连接到周围的实体服务，比如洗车、送餐、寻找当地的折扣电影、寻找美甲店或聘请一位厨师到家中为你烹饪美食等。美国也有这样的服务，但我想中国的人口密度已经推动O2O迅速崛起。

此外，许多中国用户的第一台计算设备是智能手机。如果你的第一台计算设备是手机，你就会直接学习使用手机最有效的方式，不需要完成从电脑到手机的过渡。

《财富》：我们什么时候才不会将人工智能视为一项新奇事物，转而将它看做一项理所当然的主流技术？

吴恩达：我感觉，在高德纳技术成熟度曲线上，我们或许正在达到峰值。对超级人工智能的极度担忧也许就是这个峰值，我们正在经历这个阶段。实际情况很难预测，我也可能是错的，但我希望未来对人工智能技术能少一些炒作，更多专注技术的进步。

即便在计算机视觉领域，虽然培训计算机识别物品、人脸和手写文字方面，我们已经取得了显著的进步，但我们如何在产品中整合这项技术，却落后于技术本身的发展。

《财富》：所以，当我们的应用或设备可以识别人类或周围的物品时，我们还无法做到泰然接受的地步？

吴恩达：我想在计算机视觉方面，还需要一段时间，因为目前尚没有太多计算机视觉产品出现。不过，我可以分享一种非常有市场前景的应用。

在百度的广告系统中，如果我们向用户展示一段文字，效果很不错。但有了深度学习技术，它可以帮助广告商选择在向用户提供文本的同时，展示一个小图片。这样一来，用户不需要阅读一大段关于在巴厘岛度假的文字，广告商会展示一张巴厘岛的图片，用户一瞬间就能理解广告的意图。用户可以用更快的速度理解广告在讲什么，这项技术也将对我们连接用户和广告商的能力产生显著影响。（财富中文网）

译者：刘进龙/汪皓

审校：任文科

Google and Facebook get a lot of attention in the United States, but tell us about some of what Baidu is working on that’s powered by AI.

One of the things that Baiu did well early on was to create an internal platform for deep learning. What that did was enable engineers all across the company, including people who were not AI researchers, to leverage deep learning in all sorts of creative ways—applications that an AI researcher like me never would have thought of. There’s a very long tail of all sorts of creative products—beyond our core web search, image search and advertising businesses—that are powered by deep learning.

A couple of examples: Our computer security products use deep learning to recognize threats. I wouldn’t have thought of doing that, and I wouldn’t have known how to do it. We use deep learning to try to predict a day in advance when a hard disk is going to fail, and this increases the reliability and reduces the cost of our data centers.

Baidu has also created a Google Glass-like technology, a digital assistant and, I believe, even an intelligent bike. Is there a market for these products, or are they just interesting experiments right now?

I consider them research explorations for now. But based on the feedback we’ve gotten from the community, there is definitely demand for things like smart bikes and wearable cameras.

Actually in China, we demoed a new product called Dulife, which uses computer vision and natural language processing to try to tell blind people what is in front of them. It turns out, for example, that there are several different dollar bills in China that are the same size and blind people have to tell by touch how they’re different. But after a bill has been in circulation for some time, the tactile portions get worn down and it becomes difficult to tell the bill’s denomination. That’s one use case where simple computer vision can tell you if this a ¥20 bill or ¥50 bill. That was something that blind individuals actively requested.

Is the Chinese market where Baidu primarily operates different from the United States or elsewhere with regard to these types of mobile or wearable devices?

The Chinese market is very different. One of the things that I believe is that the biggest, hottest tech trend in China right now is O2O, or online-to-offline.

This concept of O2O refers to using your mobile device to connect you to physical services around you, whether it’s a car wash, food delivery, finding a local discounted movie right around the corner, having someone do your manicure, or hiring a chef who’ll show up in your house to cook for you. There are all these things in the United States, as well, but I think China’s high population density has made it efficient for O2O to rise very quickly.

Also, a lot of users’ first computational device in China is a smartphone. When your first computational device is a cell phone, then you start learning the most efficient ways to use a cell phone right away without having to transition from a laptop.

When will we stop talking about AI as a novel thing, and just start taking it for granted like so many other technologies?

I feel like on the Gartner Hype Cycle, we might be just cresting. I think the peak of fears about AI super-intelligence was maybe the peak, and we’re just getting past that right now. It’s hard to tell, I might be wrong, but I’d love to get to a future where there’s less hype about AI and we’re more focused on making progress.

Even in the field of computer vision, where there has been remarkable progress in training computers to recognize things such as objects, faces and handwriting, I think the technology has got a bit ahead of our ideas for how to integrate it into products.

So we’re not at the point where we can be unimpressed when our apps or devices recognize us or the stuff around us?

I think it will be a while for computer vision, because there just aren’t that many computer vision products yet. But I’ll share one that monetizes well.

It turns out that on Baidu’s advertising system, if you show a user a piece of text, that works OK. But with our deep learning technology, it can help an advertiser select a small image to show the user along with the text. So rather than having you read a bunch of text about a holiday in Bali, maybe it can show a picture of Bali, and you understand it in a fraction of a second. This helps users figure out much more quickly what an ad is about, and has a marked impact on our ability to connect users and advertisers.

撰写或查看更多观点, 请打开财富Plus APP

《财富》APP下载

杂志订阅

在社交媒体上找到我们

《财富》专访百度首席科学家：计算机不会统治世界