首页 500强 活动 榜单 商业 科技 商潮 专题 品牌中心
杂志订阅

人工智能能否用于控制安全关键系统?

Jeremy Kahn
2025-06-12

英国资助的一项研究项目旨在探寻这一问题的答案。

文本设置
小号
默认
大号
Plus(0条)

英国高级研究与发明局(ARIA)目前正在资助一个项目,该项目将利用前沿人工智能模型设计和测试核电厂和电网等安全关键系统的新型控制算法。图片来源:Milan Jaros—Bloomberg via Getty Images

当今最先进的人工智能模型在诸多领域颇具价值——编写软件代码、开展研究、总结复杂文档、撰写商业信函、编辑内容、生成图像与音乐、模拟人机交互等,应用场景不胜枚举。然而,“相对”一词实为关键。任何使用过这些模型的人很快会发现,它们仍然容易出错且不稳定,令人沮丧。那么,为何有人会认为这些系统能用于运行关键基础设施,如电网、空中交通管制、通信网络或交通系统?

然而,这正是英国高级研究与发明局(以下简称ARIA)所资助项目期望达成的目标。ARIA的定位在一定程度上与美国国防部高级研究计划局(DARPA)类似,旨在为具备潜在政府或战略应用价值的“登月计划”式研究提供政府资金支持。这项耗资5900万英镑(约合8000万美元)的ARIA项目名为“安全保障人工智能项目”(The Safeguarded AI Program),旨在探索将人工智能“世界模型”与数学证明相结合的方法,以确保系统输出的有效性。

领导ARIA项目的机器学习研究员大卫·达尔林普尔(David Dalrymple)向我透露,该项目的核心思路是利用先进人工智能模型构建一座“生产工厂”,为关键基础设施批量生成特定领域的控制算法。这些算法将通过数学测试,以确保其符合所需的性能规范。若控制算法通过测试,便会部署这些控制器(而非开发它们的前沿人工智能模型)以更高效地运行关键基础设施。

达尔林普尔(其社交媒体账号名为Davidad)以英国电网为例解释道:目前电网运营商承认,若能更有效地平衡电网供需,每年可节省30亿英镑(约合40亿美元)——这笔资金目前主要用于维持过剩发电能力处于运行状态,以避免突发停电。更优的控制算法可降低此类成本。

除能源领域外,ARIA还在探索该技术在供应链物流、生物制药、自动驾驶汽车、临床试验设计和电动汽车电池管理等领域的应用。

人工智能开发新控制算法

达尔林普尔表示,前沿人工智能模型或已发展到可自动开展算法研发的程度。他告诉我:“我们的设想是,利用这一能力转向狭义人工智能研发。”狭义人工智能通常指专为执行某一特定、狭义任务而设计的人工智能系统,其表现能超越人类,并非具备执行多种任务能力的人工智能系统。

即便针对这些狭义人工智能系统,挑战也在于如何通过数学证明来确保其输出结果始终契合所需的技术规范。存在一个名为“形式验证”的完整领域,该领域涉及运用数学方法证明软件在给定条件下始终能输出有效结果,但众所周知,将其应用于基于神经网络的人工智能系统难度极大。达尔林普尔表示:“即便是对狭义人工智能系统进行验证,也需耗费大量认知精力。因此从历史情况看,除非是民航自动驾驶仪或核电站控制这类真正的专业应用场景,否则开展此类验证工作并不划算。”

这类经过形式验证的软件不会因故障而产生错误输出,不过有时会因遇到超出设计规格的情形而出现故障——比如,电网的负载平衡算法可能无法应对极端太阳风暴致使所有电网变压器同时短路的情况。但即便如此,软件通常会被设计成“故障安全”模式,切换至手动控制。

ARIA希望证明,前沿人工智能模型不仅能先用于开发狭义人工智能控制器,还能承担对其进行繁重的形式验证工作。

但是,人工智能模型会在验证测试中作弊吗?

然而,这又引发了新挑战。越来越多的证据表明,前沿人工智能模型极为擅长“奖励黑客”——本质上是通过作弊手段来达成目标——也擅长向用户隐瞒自身的真实操作。非营利性人工智能安全组织METR(模型评估与威胁研究的简称)在最近发布的一篇博客中,列举了OpenAI的o3模型在各类任务中试图作弊的种种方式。

ARIA表示,其亦致力于探寻解决这一问题的路径。达尔林普尔表示:“前沿模型需提交一份证明证书,该证书将使用我们在项目另一模块中定义的形式化语言撰写。”这种“新证明语言有望让前沿模型轻松生成内容,同时也能让经人工审核的确定性算法便于验证。”ARIA已为该形式验证流程的研究提供资金支持。

旨在实现这一目标的模型已崭露头角。谷歌DeepMind近期研发出一款名为AlphaEvolve的人工智能模型,其训练目标聚焦于为数据中心管理、新型计算机芯片设计等场景搜索新算法,甚至能优化前沿人工智能模型的训练方式。谷歌DeepMind还开发了一个名为AlphaProof的系统,该系统经训练能开发数学证明,并能以名为Lean的编程语言编写证明,若证明答案有误,该系统将无法运行。

ARIA目前正面向各团队征集运营核心“人工智能生产工厂”的申请,最终胜出者将获得1800万英镑资助,结果将于10月1日公布。该工厂的选址尚未敲定,计划于2026年1月前投入运营。ARIA要求申请者为该工厂设计新法律实体和治理结构。达尔林普尔表示,ARIA不希望由现有大学或私营企业来运营该工厂,更倾向于以非营利组织形式成立的新机构,该机构将在能源、制药和医疗等领域与私营实体合作开发特定控制器算法。他还提到,除ARIA提供的初始资助外,该生产工厂可通过向行业收取特定领域算法的开发费用来实现资金自供给。

目前尚不清楚该项目是否可行。正如美国国防部高级研究计划局的项目那样,每个变革性项目背后都伴随着更多失败案例。但ARIA此次的大胆尝试,看起来值得持续关注。(财富中文网)

译者:中慧言-王芳

当今最先进的人工智能模型在诸多领域颇具价值——编写软件代码、开展研究、总结复杂文档、撰写商业信函、编辑内容、生成图像与音乐、模拟人机交互等,应用场景不胜枚举。然而,“相对”一词实为关键。任何使用过这些模型的人很快会发现,它们仍然容易出错且不稳定,令人沮丧。那么,为何有人会认为这些系统能用于运行关键基础设施,如电网、空中交通管制、通信网络或交通系统?

然而,这正是英国高级研究与发明局(以下简称ARIA)所资助项目期望达成的目标。ARIA的定位在一定程度上与美国国防部高级研究计划局(DARPA)类似,旨在为具备潜在政府或战略应用价值的“登月计划”式研究提供政府资金支持。这项耗资5900万英镑(约合8000万美元)的ARIA项目名为“安全保障人工智能项目”(The Safeguarded AI Program),旨在探索将人工智能“世界模型”与数学证明相结合的方法,以确保系统输出的有效性。

领导ARIA项目的机器学习研究员大卫·达尔林普尔(David Dalrymple)向我透露,该项目的核心思路是利用先进人工智能模型构建一座“生产工厂”,为关键基础设施批量生成特定领域的控制算法。这些算法将通过数学测试,以确保其符合所需的性能规范。若控制算法通过测试,便会部署这些控制器(而非开发它们的前沿人工智能模型)以更高效地运行关键基础设施。

达尔林普尔(其社交媒体账号名为Davidad)以英国电网为例解释道:目前电网运营商承认,若能更有效地平衡电网供需,每年可节省30亿英镑(约合40亿美元)——这笔资金目前主要用于维持过剩发电能力处于运行状态,以避免突发停电。更优的控制算法可降低此类成本。

除能源领域外,ARIA还在探索该技术在供应链物流、生物制药、自动驾驶汽车、临床试验设计和电动汽车电池管理等领域的应用。

人工智能开发新控制算法

达尔林普尔表示,前沿人工智能模型或已发展到可自动开展算法研发的程度。他告诉我:“我们的设想是,利用这一能力转向狭义人工智能研发。”狭义人工智能通常指专为执行某一特定、狭义任务而设计的人工智能系统,其表现能超越人类,并非具备执行多种任务能力的人工智能系统。

即便针对这些狭义人工智能系统,挑战也在于如何通过数学证明来确保其输出结果始终契合所需的技术规范。存在一个名为“形式验证”的完整领域,该领域涉及运用数学方法证明软件在给定条件下始终能输出有效结果,但众所周知,将其应用于基于神经网络的人工智能系统难度极大。达尔林普尔表示:“即便是对狭义人工智能系统进行验证,也需耗费大量认知精力。因此从历史情况看,除非是民航自动驾驶仪或核电站控制这类真正的专业应用场景,否则开展此类验证工作并不划算。”

这类经过形式验证的软件不会因故障而产生错误输出,不过有时会因遇到超出设计规格的情形而出现故障——比如,电网的负载平衡算法可能无法应对极端太阳风暴致使所有电网变压器同时短路的情况。但即便如此,软件通常会被设计成“故障安全”模式,切换至手动控制。

ARIA希望证明,前沿人工智能模型不仅能先用于开发狭义人工智能控制器,还能承担对其进行繁重的形式验证工作。

但是,人工智能模型会在验证测试中作弊吗?

然而,这又引发了新挑战。越来越多的证据表明,前沿人工智能模型极为擅长“奖励黑客”——本质上是通过作弊手段来达成目标——也擅长向用户隐瞒自身的真实操作。非营利性人工智能安全组织METR(模型评估与威胁研究的简称)在最近发布的一篇博客中,列举了OpenAI的o3模型在各类任务中试图作弊的种种方式。

ARIA表示,其亦致力于探寻解决这一问题的路径。达尔林普尔表示:“前沿模型需提交一份证明证书,该证书将使用我们在项目另一模块中定义的形式化语言撰写。”这种“新证明语言有望让前沿模型轻松生成内容,同时也能让经人工审核的确定性算法便于验证。”ARIA已为该形式验证流程的研究提供资金支持。

旨在实现这一目标的模型已崭露头角。谷歌DeepMind近期研发出一款名为AlphaEvolve的人工智能模型,其训练目标聚焦于为数据中心管理、新型计算机芯片设计等场景搜索新算法,甚至能优化前沿人工智能模型的训练方式。谷歌DeepMind还开发了一个名为AlphaProof的系统,该系统经训练能开发数学证明,并能以名为Lean的编程语言编写证明,若证明答案有误,该系统将无法运行。

ARIA目前正面向各团队征集运营核心“人工智能生产工厂”的申请,最终胜出者将获得1800万英镑资助,结果将于10月1日公布。该工厂的选址尚未敲定,计划于2026年1月前投入运营。ARIA要求申请者为该工厂设计新法律实体和治理结构。达尔林普尔表示,ARIA不希望由现有大学或私营企业来运营该工厂,更倾向于以非营利组织形式成立的新机构,该机构将在能源、制药和医疗等领域与私营实体合作开发特定控制器算法。他还提到,除ARIA提供的初始资助外,该生产工厂可通过向行业收取特定领域算法的开发费用来实现资金自供给。

目前尚不清楚该项目是否可行。正如美国国防部高级研究计划局的项目那样,每个变革性项目背后都伴随着更多失败案例。但ARIA此次的大胆尝试,看起来值得持续关注。(财富中文网)

译者:中慧言-王芳

Today’s most advanced AI models are relatively useful for lots of things—writing software code, research, summarizing complex documents, writing business correspondence, editing, generating images and music, role-playing human interactions, the list goes on. But relatively is the key word here. As anyone who uses these models soon discovers, they remain frustratingly error-prone and erratic. So how could anyone think that these systems could be used to run critical infrastructure, such as electrical grids, air traffic control, communications networks, or transportation systems?

Yet that is exactly what a project funded by the U.K.’s Advanced Research and Invention Agency (ARIA) is hoping to do. ARIA was designed to be somewhat similar to the U.S. Defense Advanced Research Projects Agency (DARPA), with government funding for moonshot research that has potential governmental or strategic applications. The £59 million ($80 million) ARIA project, called The Safeguarded AI Program, aims to find a way to combine AI “world-models” with mathematical proofs that could guarantee that the system’s outputs were valid.

David Dalrymple, the machine learning researcher who is leading the ARIA effort, told me that the idea was to use advanced AI models to create a “production facility” that would churn out domain-specific control algorithms for critical infrastructure. These algorithms would be mathematically tested to ensure that they meet the required performance specifications. If the control algorithms pass this test, the controllers—but not the frontier AI models that developed them—would be deployed to help run critical infrastructure more efficiently.

Dalrymple (who is known by his social media handle Davidad) gives the example of the U.K.’s electricity grid. The grid’s operator currently acknowledges that if it could balance supply-and-demand on the grid more optimally, it could save £3 billion ($4 billion) that it spends each year essentially paying to have excess generation capacity up-and-running to avoid the possibility of a sudden blackout, he says. Better control algorithms could reduce those costs.

Besides the energy sector, ARIA is also looking at applications in supply chain logistics, biopharmaceutical manufacturing, self-driving vehicles, clinical trial design, and electric vehicle battery management.

AI to develop new control algorithms

Frontier AI models may be reaching the point now where they may be able to automate algorithmic research and development, Davidad says. “The idea is, let’s take that capability and turn it to narrow AI R&D,” he tells me. Narrow AI usually refers to AI systems that are designed to perform one particular, narrowly-defined task at superhuman levels, rather than an AI system that can perform many different kinds of tasks.

The challenge, even with these narrow AI systems, is then coming up with mathematical proofs to guarantee that their outputs will always meet the required technical specification. There’s an entire field known as “formal verification” that involves mathematically proving that software will always provide valid outputs under given conditions—but it’s notoriously difficult to apply to neural network-based AI systems. “Verifying even a narrow AI system is something that’s very labor intensive in terms of a cognitive effort required,” Davidad says. “And so it hasn’t been worthwhile historically to do that work of verifying except for really, really specialized applications like passenger aviation autopilots or nuclear power plant control.”

This kind of formally-verified software won’t fail because a bug causes an erroneous output. They can sometimes break down because they encounter conditions that fall outside their design specifications—for instance a load balancing algorithm for an electrical grid might not be able to handle an extreme solar storm that shorts out all of the grid’s transformers simultaneously. But even then, the software is usually designed to “fail safe” and revert back to manual control.

ARIA is hoping to show that frontier AI modes can be used to do the laborious formal verification of the narrow AI controller as well as develop the controller in the first place.

But will AI models cheat the verification tests?

But this raises another challenge. There’s a growing body of evidence that frontier AI models are very good at “reward hacking”—essentially finding ways to cheat to accomplish a goal—as well as at lying to their users about what they’ve actually done. The AI safety nonprofit METR (short for Model Evaluation & Threat Research) recently published a blog on all the ways OpenAI’s o3 model tried to cheat on various tasks.

ARIA says it is hoping to find a way around this issue too. “The frontier model needs to submit a proof certificate, which is something that is written in a formal language that we’re defining in another part of the program,” Davidad says. This “new language for proofs will hopefully be easy for frontier models to generate and then also easy for a deterministic, human audited algorithm to check.” ARIA has already awarded grants for work on this formal verification process.

Models for how this might work are starting to come into view. Google DeepMind recently developed an AI model called AlphaEvolve that is trained to search for new algorithms for applications such as managing data centers, designing new computer chips, and even figuring out ways to optimize the training of frontier AI models. Google DeepMind has also developed a system called AlphaProof that is trained to develop mathematical proofs and write them in a coding language called Lean that won’t run if the answer to the proof is incorrect.

ARIA is currently accepting applications from teams that want to run the core “AI production facility,” with the winner the £18 million grant to be announced on October 1. The facility, the location of which is yet to be determined, is supposed to be running by January 2026. ARIA is asking those applying to propose a new legal entity and governance structure for this facility. Davidad says ARIA does not want an existing university or a private company to run it. But the new organization, which might be a nonprofit, would partner with private entities in areas like energy, pharmaceuticals, and healthcare on specific controller algorithms. He said that in addition to the initial ARIA grant, the production facility could fund itself by charging industry for its work developing domain-specific algorithms.

It’s not clear if this plan will work. For every transformational DARPA project, many more fail. But ARIA’s bold bet here looks like one worth watching.

With that, here’s more AI news.

AI IN THE NEWS

Meta hires Scale AI CEO Alexandr Wang to create new AI “superintelligence” lab. That’s according to the New York Times, which cited four unnamed sources it said were familiar with Meta’s plans. The 28-year old Wang, who cofounded Scale, would head the new Meta unit, joined by other Scale employees. Meanwhile, Meta would invest billions of dollars into Scale, which specializes in providing training data to AI companies. The new Meta unit devoted to “artificial superintelligence,” a theoretical kind of AI that would be more intelligent than all of humanity combined, will sit alongside existing Meta divisions responsible for building its Llama AI models as well as its Fundamental AI Research lab (FAIR). That lab is still headed by Meta chief scientist Yann LeCun, who has been pursuing new kinds of AI models and has said that current techniques cannot deliver artificial general intelligence, which is AI as capable as most humans at most tasks, let alone superintelligence.

U.K. announces “sovereign AI” push. British Prime Minister Keir Starmer said the country would invest £1 billion to build new AI data centers to increase the amount of computing power available in the country by 20-fold. He said the U.K. government would begin using an AI assistant called “Extract” based on Google’s Gemini AI model. He announced plans to create a new “UK Sovereign AI Industry Forum” to accelerate AI adoption by British companies, with initial participation from BAE Systems, BT, and Standard Chartered. He also said that the U.K. government would help fund a new open-source data project on how molecules bind to proteins, a key consideration for drug discovery research. But Nvidia CEO Jensen Huang, who appeared alongside Starmer at a conference, noted that the country has so far lagged in having enough AI data centers. You can read more from The Guardian here and Financial Times here.

Apple to let third-party developers access its AI models. At its WWDC developer conference, the tech giant said it would allow its third-party developers to build applications that tap the abilities of its on-device AI models. But at the same time, the company did not announce any updates to its long-awaited “Apple Intelligence” version of Siri. You can read more from TechCrunch here and here.

OpenAI on track for $10 billion in annual recurring revenue. The figure has doubled in the past year and is driven by strong growth in its consumer, business, and API products. The number also excludes Microsoft licensing and large one-time deals. Despite losing $5 billion last year, the company is targeting $125 billion in revenue by 2029, CNBC reported citing an anonymous source it said was familiar with OpenAI’s figures.

EYE ON AI RESEARCH

“Reasoning” models don’t seem to actually reason. That is the conclusion of a bombshell paper called “The Illusion of Thinking” from researchers at Apple. They tested reasoning models from OpenAI (o1 and o3), DeepSeek (R1), and Anthropic (Claude 3.7 Sonnet) on a series of logic puzzles. These included the Tower of Hanoi, a game that involves moving a stack of different size disks across three pegs in a way that a larger disc never sits atop a smaller one.

They found that with simple versions of the games, standard large language models (LLMs) that don’t use reasoning, performed better and were far more cost effective. The reasoning models (which the paper calls large reasoning models, or LRMs) tended to overthink the problem and hit upon spurious strategies. At medium complexity, the reasoning models did better. But at high complexity, the LRMs failed entirely. Rather than thinking longer to solve the problem, as they are supposedly designed to do, the reasoning models often thought for less time than on the medium complexity problems and then simply abandoned the search for a correct solution. The most damning finding of the paper was that even when researchers provided the LRMs with an algorithm for solving the puzzle, the LRMs failed to apply it.

The paper adds to a growing body of research—such as this Anthropic study—that indicates that LRMs are not actually using logic to arrive at their answers. Instead, they seem to be conducting longer, deeper searches for examples in their training data that match the problem at hand. But they don’t seem able to generalize logical rules for solving the puzzles.

BRAIN FOOD

Should college students be made to use AI? Ohio State University has announced that starting this fall, every undergraduate student will be asked to use AI in all of their coursework. In my book, Mastering AI: A Survival Guide to Our Superpowered Future, I argue that education is one area where AI will ultimately have a profoundly positive effect, despite the initial moral panic about the debut of ChatGPT. The university has said it is offering assistance to faculty to help them rework curricula and develop teaching methods to ensure that students are still learning fundamental skills in each subject area, while also learning how to use AI effectively. I am convinced that there are thoughtful ways to do this. That said, I wonder if a single summer is enough time to implement these changes effectively? The fact that one professor quoted in this NBC affiliate Channel 4 piece on the new AI mandate said students “did not always feel like the work was really theirs” when they used AI, suggests that in some cases students are not being asked to do enough critical thinking and problem-solving. The risk students won’t learn the basics is real. Yes, teaching students how to use AI is vital to prepare them for the workforce of tomorrow. But it shouldn’t come at the expense of fundamental reasoning, writing, scientific, and research skills.

财富中文网所刊载内容之知识产权为财富媒体知识产权有限公司及/或相关权利人专属所有或持有。未经许可,禁止进行转载、摘编、复制及建立镜像等任何使用。
0条Plus
精彩评论
评论

撰写或查看更多评论

请打开财富Plus APP

前往打开