Artificial intelligence is seemingly everywhere today, doing everything from powering voice controls and image search on our smartphones to supplying the brains for autonomous cars and robots. And while research on the topic from Google GOOG -0.05% , Facebook FB 1.16% and Microsoft MSFT -0.07% gets most of the attention in the United States, Chinese search giant Baidu BIDU -2.52% is also a major player in the space.
Its efforts are led by Andrew Ng, the company’s chief scientist who previously taught machine learning at Stanford, co-founded Coursera and helped create the Google Brain project. In 2012, Google Brain kicked off mainstream interest in a field of AI known as deep learning by showing how computers can teach themselves to recognize cats in YouTube videos. In this interview, Ng, who works out of Baidu’s artificial intelligence lab in Sunnyvale, Calif., explains why artificial intelligence is so hot right now, how companies are using it to make real money, and why concerns over an impending AI apocalypse are probably overblown.
The following was edited for length and clarity:
Fortune: How would you define artificial intelligence, as least as it applies to commercially viable approaches?
Andrew Ng: What we’re seeing in the last several years is that computers are getting much better at soaking up data to make predictions. These include predicting what ad a user is most likely to click on, recognizing people in pictures, predicting what’s the web page most relevant to your search query — and hundreds of other of examples like that. Many of these are making digital experiences much better for users and, in some cases, increasing the bottom line for companies.
Is it fair to say then that mainstream AI is more about recognizing patterns in data than it is about building computers that think like humans?
Despite all the hype, I think they are much further off than some people think. Almost all the economic value created by AI today is with one type of technology, called supervised learning. What that means is learning to predict outcomes or classify things based on other example inputs the system has already seen, such as “Given a picture, find the people in it.” Or, “Given a web page, find whether user is going to click on this web page.” Or, “Given an email, determine if this a piece of spam or not.”
Speech recognition is another example of this, where the input is an audio clip and the output is a transcription of what was said.
Speech recognition has been in the news lately because of some new Apple Siri features. What are the next steps to make assistant-type apps more useful?
The vision we’re pursuing is trying to make talking to a computer feel as natural as talking to a person. That’s a distant goal, we won’t get there anytime soon, but as we reach that point, more users will use it. Today, speech is largely used by technology enthusiasts. Most people around the world are not using speech for their interactions with computers.
Talking to a machine still feels very different from talking to a person: you’re allowed to say certain things, you can’t interrupt a computer. Sometimes it takes a little longer to respond. Sometimes you say something and it’s very confused. Here’s one specific example: If I’m talking to a computer and I say, “Please call Carol at 555-1000 … no, wait, 1005,” can a computer interpret that correctly and take the right action?
What happened in the past few years to take us from seemingly little consumer-facing AI to today, where things like speech recognition and algorithms that can understand photos seem commonplace?
A lot of the progress in machine learning—and this is an unpopular opinion in academia—is driven by an increase in both computing power and data. An analogy is to building a space rocket: You need a huge rocket engine, and you need a lot of fuel. If you have a smaller rocket engine and a lot of fuel, your rocket’s probably not going to get off the ground. If you have a huge rocket engine but a tiny amount of fuel, you probably won’t make it to orbit.
It’s only with a huge engine and a lot of fuel that you can go interesting places. The analogy is that the rocket engines are the large computers—in Baidu’s case, supercomputers we can now build—and the rocket fuel is the huge amounts of data we now have.
Over the last decade, the rise of data, or the rise of rocket fuel, got a little ahead of our ability to build rocket engines to absorb that fuel. But now our ability to scale up our rocket engines has been catching up, and in some cases surpassing our ability to supply the rocket fuel. You have to work hard to scale them at the same time.
It seems like every time deep learning is applied to a task, it produces the best-ever results for that task. Could we apply this to, say, corporate sales data and identify meaningful insights faster than with traditional enterprise software or popular “big data” tools?
The big limitation of deep learning is that almost all the value it’s creating is in these input-output mappings. If you have corporate data where X is maybe a user account in Amazon, and Y is “Did they buy something?” and you have a lot of data with X-Y pairs, then you could do it. But in terms of going through data and discovering things by itself, that type of algorithm is very much in its infancy, I would say.
This is also one of the reasons why the AI evil killer robots and super-intelligence hype is overblown. This X-Y type of mapping is such a narrow form of learning. Humans learn in so many more ways than this. The technical term for this is supervised learning, and I think we just haven’t figured out the right ideas yet for the other types of learning.
Unsupervised learning is one of those other types, where you just look at data and discover stuff about the world. Humans seem to be amazing at that. Computers have incredibly rudimentary algorithms that try to do some of that, but are clearly nowhere near what any human brain can do.
Google and Facebook get a lot of attention in the United States, but tell us about some of what Baidu is working on that’s powered by AI.
One of the things that Baiu did well early on was to create an internal platform for deep learning. What that did was enable engineers all across the company, including people who were not AI researchers, to leverage deep learning in all sorts of creative ways—applications that an AI researcher like me never would have thought of. There’s a very long tail of all sorts of creative products—beyond our core web search, image search and advertising businesses—that are powered by deep learning.
A couple of examples: Our computer security products use deep learning to recognize threats. I wouldn’t have thought of doing that, and I wouldn’t have known how to do it. We use deep learning to try to predict a day in advance when a hard disk is going to fail, and this increases the reliability and reduces the cost of our data centers.
Baidu has also created a Google Glass-like technology, a digital assistant and, I believe, even an intelligent bike. Is there a market for these products, or are they just interesting experiments right now?
I consider them research explorations for now. But based on the feedback we’ve gotten from the community, there is definitely demand for things like smart bikes and wearable cameras.
Actually in China, we demoed a new product called Dulife, which uses computer vision and natural language processing to try to tell blind people what is in front of them. It turns out, for example, that there are several different dollar bills in China that are the same size and blind people have to tell by touch how they’re different. But after a bill has been in circulation for some time, the tactile portions get worn down and it becomes difficult to tell the bill’s denomination. That’s one use case where simple computer vision can tell you if this a ¥20 bill or ¥50 bill. That was something that blind individuals actively requested.
Is the Chinese market where Baidu primarily operates different from the United States or elsewhere with regard to these types of mobile or wearable devices?
The Chinese market is very different. One of the things that I believe is that the biggest, hottest tech trend in China right now is O2O, or online-to-offline.
This concept of O2O refers to using your mobile device to connect you to physical services around you, whether it’s a car wash, food delivery, finding a local discounted movie right around the corner, having someone do your manicure, or hiring a chef who’ll show up in your house to cook for you. There are all these things in the United States, as well, but I think China’s high population density has made it efficient for O2O to rise very quickly.
Also, a lot of users’ first computational device in China is a smartphone. When your first computational device is a cell phone, then you start learning the most efficient ways to use a cell phone right away without having to transition from a laptop.
When will we stop talking about AI as a novel thing, and just start taking it for granted like so many other technologies?
I feel like on the Gartner Hype Cycle, we might be just cresting. I think the peak of fears about AI super-intelligence was maybe the peak, and we’re just getting past that right now. It’s hard to tell, I might be wrong, but I’d love to get to a future where there’s less hype about AI and we’re more focused on making progress.
Even in the field of computer vision, where there has been remarkable progress in training computers to recognize things such as objects, faces and handwriting, I think the technology has got a bit ahead of our ideas for how to integrate it into products.
So we’re not at the point where we can be unimpressed when our apps or devices recognize us or the stuff around us?
I think it will be a while for computer vision, because there just aren’t that many computer vision products yet. But I’ll share one that monetizes well.
It turns out that on Baidu’s advertising system, if you show a user a piece of text, that works OK. But with our deep learning technology, it can help an advertiser select a small image to show the user along with the text. So rather than having you read a bunch of text about a holiday in Bali, maybe it can show a picture of Bali, and you understand it in a fraction of a second. This helps users figure out much more quickly what an ad is about, and has a marked impact on our ability to connect users and advertisers.