0条Plus

一个用于打游戏的AI算法，或将帮助解决人类难题

Jeremy Kahn 2020-12-29

凭借这种算法，DeepMind的人工智能在很多游戏上实现了超常水平。

伦敦人工智能公司DeepMind开发的一种算法能够在最开始并不知道游戏规则的情况下学会玩游戏，并且能达到超越人类的水平。该公司公布了有关这种算法的最新细节，并表示该成就向创建人工智能系统解决复杂的、不确定的现实状况迈出了一大步。

DeepMind将这种算法命名为MuZero。它已经学会了下围棋和日本策略游戏将棋，还有一系列雅达利（Atari）经典视频游戏，均达到了超常水平。之前，DeepMind创造的许多算法能够掌握某一种游戏，但一直没有一种算法能够同时学会棋类游戏和视频游戏。而且DeepMind之前开发的掌握棋类游戏的算法阿尔法元（AlphaZero），首先要知道游戏的规则，但MuZero并不需要知道规则。

阿尔法元是阿尔法狗（AlphaGo）的升级版。2016年，DeepMind推出的围棋算法阿尔法狗在韩国举办的一场比赛中，击败了世界围棋名将李世石，一战成名。

DeepMind隶属于谷歌（Google）母公司Alphabet。该公司在2019年公布了MuZero，但本周三它在知名科学期刊《自然》（Nature）上发表了一篇同行评议论文，公布了有关该算法的更多信息。

MuZero首先会创建一个模型，模拟它所理解的游戏的运行方式，然后利用这个模型规划在游戏中最有利的动作。这种算法通过重复玩游戏，学习完善模型和计划的行动。在双人游戏中，MuZero通过与其之前的版本对战不断学习。

对于真实世界状况更重要的是，算法创建的游戏规则模型并不需要100%准确，甚至不一定是完整的。模型只需要能够帮助MuZero在游戏中进步即可，之后它会逐步完善模型。

DeepMind计算机科学家、MuZero开发团队的负责人戴维•西尔沃告诉《财富》杂志：“我们只是告诉系统，去吧，去创建你自己对于世界运行方式的内部构想。你在使用它的时候，只要这种内部构想能够生成实际匹配现实的东西，我们就能接受。”

在《自然》上发表的论文中，DeepMind介绍了制定计划对于这种算法的能力的重要性：MuZero制定计划可用的时间越多，表现越好。如果在下围棋的时候，MuZero有50秒钟思考一步棋，它的能力会比只有十分之一秒的情况高出数倍，相当于一位强大的业余棋手和一位强大的专业棋手之间的区别。

在雅达利视频游戏中也存在类似的差异，在这些游戏中，快速反应时间往往比战略性思考更重要。在玩这些游戏的时候，MuZero如果获得更多时间，可以推算出更多可能情景中会发生的结果。研究人员注意到，该系统在《吃豆人小姐》（Ms. Pac-Man）游戏中表现很出色，即使该系统的时间只能推算出6至7种可能的动作，这些时间并不足以使系统形成对所有可能性的完整理解。

DeepMind并没有测试MuZero玩多人游戏的表现，例如扑克牌或桥牌等，在这类游戏中隐藏信息很重要。西尔沃表示，他认为MuZero或许也能学会玩这类游戏，而且公司计划进一步探索。卡内基梅隆大学（Carnegie Mellon University）和Facebook的人工智能研究人员之前创建的人工智能系统，曾经战胜过扑克牌冠军。但桥牌部分依赖沟通，因此依旧很有挑战性。

西尔沃表示，DeepMind正在考虑MuZero的许多现实应用。他表示，到目前为止，最有前途的应用是视频压缩。目前视频信号压缩有许多不同的方法，但没有明确的规则能判断对于不同视频哪一种是最佳压缩方法。他说使用类似于MuZero的算法所做的初步试验显示，与之前的最佳压缩方法相比，算法压缩的视频需要的带宽能减少5%。西尔沃还表示，MuZero可能有助于开发功能更强大的机器人和数字助手，并且可以扩展DeepMind最近在预测蛋白质结构方面取得的突破。到目前为止，这项研究并没有使用公司在游戏研究方面开发的先进技术。

但有些机构已经将MuZero应用于不同领域。上周，美国空军表示，其使用DeepMind去年免费发布的MuZero的相关信息开发了一款人工智能系统，该系统能够自动控制U-2侦察机的雷达。美国空军在12月14日的一次训练任务中模拟了一次导弹袭击，并在一架U-2蛟龙夫人侦察机上测试了这款人工智能系统ARTUMu。由计算机科学家、武器控制专家和人权活动人士领导的“阻止杀手机器人”（Stop Killer Robots）运动表示，美国空军的研究朝着制造自动化致命武器迈出了危险的一步。

DeepMind告诉《财富》杂志称，对于美国空军的研究，公司没有参与也毫不知情，直到上周才看到有关此次训练任务的媒体报道。DeepMind之前承诺避免参与研究进攻性武器能力，或者能识别和跟踪目标并且会在没有人类最终决策的情况下部署武器攻击目标的人工智能。（财富中文网）

翻译：刘进龙

审校：汪皓

阿尔法元是阿尔法狗（AlphaGo）的升级版。2016年，DeepMind推出的围棋算法阿尔法狗在韩国举办的一场比赛中，击败了世界围棋名将李世石，一战成名。

翻译：刘进龙

审校：汪皓

London A.I. company DeepMind has published new details about an algorithm that can learn to play games at superhuman levels—even when it doesn’t start out knowing the rules of the game, an achievement that the company says is a big step toward creating A.I. systems that can deal with complicated and uncertain real-world situations.

The algorithm, which DeepMind calls MuZero, has learned to play chess, Go, and the Japanese strategy game Shogi, as well as a host of classic Atari video games at superhuman levels. Previously, DeepMind had created algorithms that could master each of these games, but not a single algorithm that could handle both the board games and the video games. Also, DeepMind’s previous algorithm for mastering the board games, AlphaZero, started out knowing the rules, while MuZero does not.

AlphaZero was itself a more general variant of AlphaGo, the Go-playing algorithm DeepMind famously demonstrated in 2016, defeating Lee Sedol, at the time the world’s top-ranked Go player, in a match in South Korea.

DeepMind, which is owned by Google parent Alphabet, first unveiled MuZero in 2019, but on Wednesday it published more information about the algorithm in a peer-reviewed paper in the prestigious scientific journal Nature.

MuZero works by constructing a model of how it thinks the game it is playing works and then using that model to plan the most beneficial actions in the game. It learns to improve both the model and its planned actions by playing the game over and over again. In the case of the two player games, MuZero learns by playing against previous versions of itself.

More important for real-world situations, the model that the algorithm creates of the rules of the game doesn’t have to be 100% accurate, or even complete. It just has to be useful enough that MuZero is able to make some progress in the game from which it can begin to improve.

“We are basically saying to the system, just go and make up your own internal fiction about how the world works,” David Silver, the DeepMind computer scientist who led the team that built MuZero, told Fortune. “As long as this internal fiction leads to something that actually matches reality when you come to use it, then we’re fine with it.”

In the Nature paper, DeepMind showed the importance of planning to the algorithm’s capability: The more time MuZero was given to plan, the better it performed. MuZero was many times more capable at Go—about the difference between a strong amateur and a strong professional player—when given 50 seconds to consider a move, compared with when it was given just one-tenth of a second.

This difference held even in the Atari games, where quick reaction times are often thought to matter more than strategic thinking. Here, more time allowed MuZero to game out what might happen in more possible scenarios. The researchers noted that the system achieved very good performance in a game like Ms. Pac-Man, even when it was only given enough time to explore six or seven possible moves, which was far too few to gain a complete understanding of all the possibilities.

While DeepMind has not tested MuZero on multiplayer games where hidden information plays an important role—such as poker or bridge—Silver said he suspects MuZero might be able to learn to play these games too, and that the company plans to explore this further. A.I. researchers from Carnegie Mellon University and Facebook have previously built A.I. systems capable of beating champion poker players. Bridge, which relies in part on communication, remains a challenge.

Silver said DeepMind is considering several real-world uses for MuZero. One of the most promising so far, Silver said, is video compression, where there are many different ways to compress a video signal, but no clear rules about which one is best for different kinds of video. He said that initial experiments with MuZero-like algorithms had shown it might be possible to achieve a 5% reduction in bandwidth over the best previous compression methods. Silver also said MuZero might be useful for building more capable robots and digital assistants as well as extending DeepMind’s recent breakthrough in predicting the structure of proteins, research that has so far not relied on the techniques the company pioneered in its games research.

Others, however, are already taking MuZero in very different directions. Last week, the U.S. Air Force revealed that it had used information about MuZero that DeepMind had made freely available to the public last year to help create an A.I. system that could autonomously control the radar of a U-2 spy plane. The Air Force tested the A.I. system, which it calls ARTUMu, on a U-2 Dragon Lady spy plane during a simulated missile strike in a training mission on Dec. 14. Stop Killer Robots, a campaign led by computer scientists, arms control experts and human rights activists, said the Air Force research was a dangerous step toward creating lethal autonomous weapons.

DeepMind told Fortune it had no role in the Air Force research and was unaware of it until seeing news reports about the training mission last week. DeepMind has previously pledged to avoid work on offensive weapons capabilities or A.I. that can identify and track targets and deploy weapons against them without a human making the final decision about striking those particular targets.

精选评论

撰写或查看更多评论, 请打开财富Plus APP

热读文章

热门视频

500强行业分布