代理式人工智能系统承担的工作日益增多，人类亟需建立验证机制

Alexei Oreskovic

2026-06-17

从模型幻觉到智能体失控，人工智能应用潜藏着诸多显性风险。

文本设置

小号

默认

大号

Plus(0条)

凯特琳·哈弗蒂（Caitlin Halferty），汤森路透（Thomson Reuters）首席数据官。图片来源：Michael Faas/Fortune

从模型幻觉到智能体失控，人工智能应用潜藏着诸多显性风险。

但对绝大多数企业而言，错失人工智能革命的代价同样难以承受。如何应对这一棘手的现实局面，是当今企业领导者面临的核心挑战。在科罗拉多州阿斯彭举办的《财富》科技头脑风暴峰会上，多家头部企业高管齐聚一堂，分享各自的洞见与实践经验。

各方的首要关切是问责制，具体而言，就是要能追踪——必要时可回溯——人工智能或代理式人工智能系统执行特定任务时的全部操作步骤。

自动驾驶技术公司May Mobility创始人兼首席执行官埃德温·奥尔森（Edwin Olson）表示：“我们的核心顾虑之一，是如何打造一套准确率尽可能高的系统。但同样关键的是，既然系统终究会出错，如何建立透明度与内省性，以便厘清问题根源，也能向监管机构说明后续将如何解决相关问题。”

汤森路透首席数据官凯特琳·哈弗蒂对此表示赞同，并强调了人工智能输出结果透明度的重要性：“我与团队始终恪守这一准则，并倡导客户践行该原则：任何模型的输出结果均须通过验证机制予以确认。”

汤森路透面向法律、税务合规等领域的专业人士提供多款人工智能赋能服务，因此从布局之初就必须重视人工智能问责制。哈弗蒂表示，透明度是该公司“受托级”产品的四大核心支柱之一，另外三大支柱分别为数据隐私与安全、行业专业人才、可靠内容。

多位与会嘉宾提到的另一项重要方法，是设计可实现有效相互监管的系统。奥尔森介绍，May Mobility在自动驾驶汽车中搭载多套系统，可同时模拟、评估多种场景并选出最优方案。

此类系统同样适用于企业场景和日常工作流程。Trustguard AI的创始人兼首席执行官埃琳娜·克沃奇科（Elena Kvochko）将其称为“大语言模型裁判”技术，并以新闻编辑室的运作模式解释其原理。

“一个人或智能体负责撰写内容，另一个人或智能体担任编辑——唯一职责就是找出错误，或是发现作者可能疏漏的任何不实信息。大语言模型系统也应按照这种思路设计，从而实现自我迭代优化。”

但克沃奇科补充道，关键在于验证环节必须由独立的人工智能系统来承担。“你绝不希望人工智能给自己的工作打分。”她表示。

随着人工智能技术承担的任务日益增多，人工验证所有输出已然力有不逮，因此建立一套智能的人工智能验证架构将变得愈发关键。

SentinelOne首席人工智能官格雷戈尔·斯图尔特（Gregor Stewart）表示：“你终会陷入这样的困境：人工智能完成的工作量极大，待审核任务繁重，根本无法真正实现问责。”

他以计算机编程为例，称该行业的相关实践比其他行业领先约一年。团队不再依靠人工审核上万行人工智能生成的代码，而是探索让智能体模拟数十年前安全关键领域的人工验证流程。

斯图尔特表示：“我认为，过去为安全关键技术研发的各类方法，将重新应用于日常业务场景。”（财富中文网）

译者：中慧言-王芳

从模型幻觉到智能体失控，人工智能应用潜藏着诸多显性风险。

各方的首要关切是问责制，具体而言，就是要能追踪——必要时可回溯——人工智能或代理式人工智能系统执行特定任务时的全部操作步骤。

但克沃奇科补充道，关键在于验证环节必须由独立的人工智能系统来承担。“你绝不希望人工智能给自己的工作打分。”她表示。

随着人工智能技术承担的任务日益增多，人工验证所有输出已然力有不逮，因此建立一套智能的人工智能验证架构将变得愈发关键。

斯图尔特表示：“我认为，过去为安全关键技术研发的各类方法，将重新应用于日常业务场景。”（财富中文网）

译者：中慧言-王芳

From hallucinations to rogue agents, there are some very clear risks that come with using AI.

And yet, most businesses cannot afford to sit out the AI revolution. Managing this thorny reality is a fundamental challenge for business leaders today, and executives at several leading companies came together to share their insights and experience at Fortune Brainstorm Tech in Apsen, Colorado.

At the top of the priority list is accountability. That is, being able to follow—and if necessary re-trace—all the steps that an AI or agentic AI system took in performing a particular task.

“A key thing that we worry about is how do you build a system that is as right as often as you can possibly make it,” said Edwin Olson, the founder and CEO, autonomous driving technology firm May Mobility. “But also, critically, because you know it’s going to eventually make mistakes, how do you create the transparency and introspectability, so you can understand why it made a mistake and then talk to regulators about how you know that you fixed that issue moving forward.”

Caitlin Halferty, the chief data officer at Thomson Reuters, echoed the sentiment, stressing the importance of transparent output from AI: “I do this with my teams, myself, I encourage this with my clients, making sure there’s a way in which you can validate the output of any model that you’re using.”

With a portofoio of AI-enabled services aimed at professionals in fields like legal and tax compliance, Thomson Reuters has had to focus on AI accountability from early on. Transparency is one of four key pillars of what the company calls “fiduciary grade” products, Halferty said, alongside data privacy and security, subject matter experts, and reliable content.

Another important technique cited by several panelists is designing systems that are effectively able to regulate each other. At May Mobility, Olson said that involves installing systems in autonomous cars that are capable of simulating and assessing various scenarios simultaneously and choosing the best option.

But such systems an also be used in corporate settings and day-to-day workflow. Elena Kvochko, the founder and CEO of Trustguard AI, calls it the “LLM as a judge” technique and uses the analogy of a newsroom to explain how it works.

“You have one person or agent whose job is to be the writer, and then the other person or agent whose job is to be the editor—its sole purpose is to find mistakes, or any inaccuracy that the writer could have potentially missed. So basically this is how you you want your LLM systems to also be designed, so that they are self improving.”

But, Kvochko adds, the key is that the verification has to be structured in separate AI systems. “You don’t want AI to grade its own work,” she said.

Having a smart structure for AI verification is going to become increasingly critical as the technology performs more and more tasks, outpacing the ability of humans to verify all the work.

“You end up in this space where you’ve got so much work that’s been done, so much work to audit, that you can’t truly be accountable,” said SentinelOne Chief AI Officer Gregor Stewart.

He pointed to computer coding, which he said is about one year ahead of other industries. Rather than have a human verify ten thousand lines of AI-written code, teams are figuring out ways to have agents emulate some of the processes developed decades ago for humans in safety-critical industries.

“I think we’re going to see a resurgence of a bunch of techniques we developed for safety critical technologies imported into just average practice,” said Stewart.

财富中文网所刊载内容之知识产权为财富媒体知识产权有限公司及/或相关权利人专属所有或持有。未经许可，禁止进行转载、摘编、复制及建立镜像等任何使用。

0条Plus

精彩评论

撰写或查看更多评论

请打开财富Plus APP

前往打开

热读文章

关注我们

代理式人工智能系统承担的工作日益增多，人类亟需建立验证机制

撰写或查看更多评论