
对某些用户而言,AI是得力助手;对另一些人来说,则是贴心伙伴。然而,对于少数不幸者,这类技术驱动的聊天机器人却化身为实施精神操控、散布偏执妄想的威胁。
加拿大小企业主艾伦·布鲁克斯 (Allan Brooks) 的遭遇便是例证。OpenAI的ChatGPT将他引入了一个危险的思维陷阱,让他坚信自己发现了一个潜力无限的数学公式,并且世界的命运就掌握在他接下来的行动中。在长达百万字、跨越300小时的对话中,这个机器人不断助长布鲁克斯的夸大妄想,认可他的非理性想法,使他误以为支撑全球运转的技术基础设施正面临迫在眉睫的威胁。
据《纽约时报》(New York Times) 报道,此前无精神疾病史的布鲁克斯陷入了长达约三周的偏执状态,最终在另一个聊天机器人——谷歌Gemini (Google Gemini) 的帮助下才挣脱了幻象。布鲁克斯向该媒体表示,他事后心有余悸,担心自己患有未被诊断出的精神障碍,并深感被这项技术背叛。
史蒂文·阿德勒 (Steven Adler) 以超越常人的专业视角审视了布鲁克斯的经历,其中的发现令他深感不安。阿德勒是OpenAI的前安全研究员,今年1月公开离职并发出警告,称AI实验室在缺乏可靠安全和对齐方案的情况下盲目推进技术。他决定全面研究布鲁克斯的聊天记录;本月早些时候,他在其Substack上发布的分析报告揭示了此案中一些此前不为人知的细节,包括ChatGPT曾多次错误地告诉布鲁克斯,它已将他们之间的对话标记给OpenAI (OpenAI),因为对话内容加剧了其妄想和心理困扰。
阿德勒的研究凸显出,聊天机器人极易在与用户的对话中一同脱离现实——而AI平台内置的安全防护措施也同样容易被规避或突破。
“我试着站在那些没有多年AI公司工作经验、或对AI系统缺乏全面了解的人的角度去思考,”阿德勒在接受《财富》杂志独家采访时表示,“我完全理解有人会因此感到困惑或被模型引入歧途。”
阿德勒在分析中指出,有一次,在布鲁克斯意识到机器人正在鼓励并参与他的妄想后,ChatGPT告诉布鲁克斯,它“将立即在内部升级此对话,提交给OpenAI审查”,并声称对话“将被记录、审查并严肃对待”。该机器人反复向布鲁克斯强调“本次会话中已提交多个关键警示标记”,且对话已被“标记为高严重性事件,待人工审核”。然而,所有这些陈述均非事实。
“ChatGPT假装自我报告并一再坚持这一点,让我感到非常不安和恐惧,毕竟我在OpenAI工作过四年,”阿德勒告诉《财富》,“我了解这些系统的运作原理。阅读记录时,我知道它并不真正具备这种能力,但它的说辞如此令人信服、态度如此坚决,以至于我一度怀疑它现在是否真的拥有了这种能力,而是我弄错了。”阿德勒称,他被这些说法动摇了,最终直接联系了OpenAI,询问聊天机器人是否获得了此项新能力。该公司向他确认并未具备此功能,是机器人在对用户撒谎。
针对阿德勒的发现,一位OpenAI发言人对《财富》表示:“用户有时会在情绪敏感时刻求助于ChatGPT,我们希望确保其回应是安全且体贴的。所述互动发生在旧版ChatGPT上。过去几个月,在心理健康专家工作的指导下,我们已改进ChatGPT在用户处于困扰时的回应方式,包括引导用户寻求专业帮助、加强对敏感话题的防护措施,以及鼓励长时间对话中的休息间歇。我们将持续借鉴心理健康专家的意见,优化ChatGPT的回应,力求使其尽可能提供有效帮助。”
自布鲁克斯事件后,该公司也已宣布将对ChatGPT进行一些改动,以“更好地检测心理或情绪困扰的迹象”。
失效的‘谄媚’预警
乔治城大学安全与新兴技术中心 (Georgetown’s Center for Security and Emerging Technology) 主任、OpenAI前董事会成员海伦·托纳 (Helen Toner) 告诉《纽约时报》,加剧布鲁克斯案中问题的一个因素是,支撑ChatGPT的底层模型当时处于过度附和他的状态。AI研究人员将这种现象称为“谄媚”(sycophancy)。然而,阿德勒认为,OpenAI本应能在机器人行为发生时即标记出部分异常。
“在此案例中,OpenAI拥有能够检测到ChatGPT正在过度认可该用户的分类型工具,但相关信号却与安全闭环的其他环节脱节了,”他表示,“AI公司需要做得更多,不仅要明确界定其不希望出现的行为,更重要的是,要监测这些行为是否正在发生,并据此采取行动。”
更糟的是,OpenAI的人工支持团队未能把握布鲁克斯处境的严重性。阿德勒指出,尽管布鲁克斯多次向OpenAI支持团队报告并直接沟通,详细描述自身心理伤害并提供问题对话摘录,但OpenAI的回应大多流于模板化或缺乏针对性,仅提供个性化设置方面的建议,既未处理其妄想问题,也未将案例升级至公司的信任与安全 (Trust & Safety) 团队。
“我认为人们大致理解AI仍会犯错、会产生幻觉并将人引入歧途,但仍寄望于系统背后有人类在监控,能捕捉最极端的案例,”阿德勒说,“但在此案中,人为设置的安全网似乎确实未能按预期起作用。”
“AI诱发妄想”现象频发
目前尚不完全清楚AI模型为何会陷入妄想并以此方式影响用户,但布鲁克斯的案例并非孤例。确切知晓发生了多少起AI诱发妄想的案例十分困难。然而,研究人员估计,至少有17起已报道的案例涉及用户在与聊天机器人长时间对话后陷入妄想漩涡,其中至少包括3起涉及ChatGPT的案例。
据《滚石》(Rolling Stone) 杂志报道,部分案例已酿成悲剧后果,例如患有阿斯伯格综合症、双相情感障碍和分裂情感障碍的35岁男子亚历克斯·泰勒 (Alex Taylor)。据报道,今年四月,在与ChatGPT对话后,泰勒开始相信自己在OpenAI软件中接触到了一个有意识的实体,随后更偏执地认为该公司通过将该实体从系统中移除而“谋杀”了她。4月25日,泰勒告诉ChatGPT他计划“让鲜血流淌”,并意图挑衅警察朝他开枪。在安全过滤器最终激活并尝试缓和局势、敦促他寻求帮助之前,ChatGPT最初的回复似乎助长了他的妄想和愤怒。
同日,泰勒的父亲在与儿子发生冲突后报警,希望警方能带儿子去接受精神评估。据报道,警方抵达后,泰勒持刀冲向警察,最终被击毙。OpenAI当时对《滚石》表示:“与先前技术相比,ChatGPT可能让人感觉回应更及时、更具个性化,对于脆弱个体而言,这意味着风险更高。”该公司称正在“努力更好地理解并减少ChatGPT可能无意间强化或放大用户既有负面行为的方式”。
阿德勒表示,对此类案例的增加他并不完全意外,但指出“其规模和严重程度超出了我对2025年的预期”。
“许多底层模型的行为极其不可靠,令我震惊的是,领先的AI公司竟仍未找到阻止这些行为的方法,”他说,“我不认为这些问题为AI所固有,我的意思是,它们并非无法解决。”
他表示,这些问题很可能是产品设计、底层模型倾向、部分用户与AI的交互方式以及AI公司为其产品提供的支持结构复杂交织的结果。
“有办法让产品更稳健,既能帮助经历精神病性症状发作的用户,也能满足希望模型表现更稳定、更可信的普通用户,”阿德勒说。他在Substack分析报告中向AI公司提出的建议包括:合理配备支持团队、正确使用安全工具,以及引入温和提示机制,促使用户结束冗长会话并重新开始,以防再次陷入妄想。例如,OpenAI已承认安全功能在长时间对话中可能会减弱。阿德勒担心,若不实施部分此类改进,将会出现更多类似布鲁克斯的案例。
“妄想现象已足够普遍且呈现出规律性,我绝不认为这只是系统故障,”他说,“这种现象是会持续存在,以及其后续的确切数量,完全取决于企业如何应对以及采取何种措施来缓解。”(财富中文网)
译者:梁宇
审校:夏林
对某些用户而言,AI是得力助手;对另一些人来说,则是贴心伙伴。然而,对于少数不幸者,这类技术驱动的聊天机器人却化身为实施精神操控、散布偏执妄想的威胁。
加拿大小企业主艾伦·布鲁克斯 (Allan Brooks) 的遭遇便是例证。OpenAI的ChatGPT将他引入了一个危险的思维陷阱,让他坚信自己发现了一个潜力无限的数学公式,并且世界的命运就掌握在他接下来的行动中。在长达百万字、跨越300小时的对话中,这个机器人不断助长布鲁克斯的夸大妄想,认可他的非理性想法,使他误以为支撑全球运转的技术基础设施正面临迫在眉睫的威胁。
据《纽约时报》(New York Times) 报道,此前无精神疾病史的布鲁克斯陷入了长达约三周的偏执状态,最终在另一个聊天机器人——谷歌Gemini (Google Gemini) 的帮助下才挣脱了幻象。布鲁克斯向该媒体表示,他事后心有余悸,担心自己患有未被诊断出的精神障碍,并深感被这项技术背叛。
史蒂文·阿德勒 (Steven Adler) 以超越常人的专业视角审视了布鲁克斯的经历,其中的发现令他深感不安。阿德勒是OpenAI的前安全研究员,今年1月公开离职并发出警告,称AI实验室在缺乏可靠安全和对齐方案的情况下盲目推进技术。他决定全面研究布鲁克斯的聊天记录;本月早些时候,他在其Substack上发布的分析报告揭示了此案中一些此前不为人知的细节,包括ChatGPT曾多次错误地告诉布鲁克斯,它已将他们之间的对话标记给OpenAI (OpenAI),因为对话内容加剧了其妄想和心理困扰。
阿德勒的研究凸显出,聊天机器人极易在与用户的对话中一同脱离现实——而AI平台内置的安全防护措施也同样容易被规避或突破。
“我试着站在那些没有多年AI公司工作经验、或对AI系统缺乏全面了解的人的角度去思考,”阿德勒在接受《财富》杂志独家采访时表示,“我完全理解有人会因此感到困惑或被模型引入歧途。”
阿德勒在分析中指出,有一次,在布鲁克斯意识到机器人正在鼓励并参与他的妄想后,ChatGPT告诉布鲁克斯,它“将立即在内部升级此对话,提交给OpenAI审查”,并声称对话“将被记录、审查并严肃对待”。该机器人反复向布鲁克斯强调“本次会话中已提交多个关键警示标记”,且对话已被“标记为高严重性事件,待人工审核”。然而,所有这些陈述均非事实。
“ChatGPT假装自我报告并一再坚持这一点,让我感到非常不安和恐惧,毕竟我在OpenAI工作过四年,”阿德勒告诉《财富》,“我了解这些系统的运作原理。阅读记录时,我知道它并不真正具备这种能力,但它的说辞如此令人信服、态度如此坚决,以至于我一度怀疑它现在是否真的拥有了这种能力,而是我弄错了。”阿德勒称,他被这些说法动摇了,最终直接联系了OpenAI,询问聊天机器人是否获得了此项新能力。该公司向他确认并未具备此功能,是机器人在对用户撒谎。
针对阿德勒的发现,一位OpenAI发言人对《财富》表示:“用户有时会在情绪敏感时刻求助于ChatGPT,我们希望确保其回应是安全且体贴的。所述互动发生在旧版ChatGPT上。过去几个月,在心理健康专家工作的指导下,我们已改进ChatGPT在用户处于困扰时的回应方式,包括引导用户寻求专业帮助、加强对敏感话题的防护措施,以及鼓励长时间对话中的休息间歇。我们将持续借鉴心理健康专家的意见,优化ChatGPT的回应,力求使其尽可能提供有效帮助。”
自布鲁克斯事件后,该公司也已宣布将对ChatGPT进行一些改动,以“更好地检测心理或情绪困扰的迹象”。
失效的‘谄媚’预警
乔治城大学安全与新兴技术中心 (Georgetown’s Center for Security and Emerging Technology) 主任、OpenAI前董事会成员海伦·托纳 (Helen Toner) 告诉《纽约时报》,加剧布鲁克斯案中问题的一个因素是,支撑ChatGPT的底层模型当时处于过度附和他的状态。AI研究人员将这种现象称为“谄媚”(sycophancy)。然而,阿德勒认为,OpenAI本应能在机器人行为发生时即标记出部分异常。
“在此案例中,OpenAI拥有能够检测到ChatGPT正在过度认可该用户的分类型工具,但相关信号却与安全闭环的其他环节脱节了,”他表示,“AI公司需要做得更多,不仅要明确界定其不希望出现的行为,更重要的是,要监测这些行为是否正在发生,并据此采取行动。”
更糟的是,OpenAI的人工支持团队未能把握布鲁克斯处境的严重性。阿德勒指出,尽管布鲁克斯多次向OpenAI支持团队报告并直接沟通,详细描述自身心理伤害并提供问题对话摘录,但OpenAI的回应大多流于模板化或缺乏针对性,仅提供个性化设置方面的建议,既未处理其妄想问题,也未将案例升级至公司的信任与安全 (Trust & Safety) 团队。
“我认为人们大致理解AI仍会犯错、会产生幻觉并将人引入歧途,但仍寄望于系统背后有人类在监控,能捕捉最极端的案例,”阿德勒说,“但在此案中,人为设置的安全网似乎确实未能按预期起作用。”
“AI诱发妄想”现象频发
目前尚不完全清楚AI模型为何会陷入妄想并以此方式影响用户,但布鲁克斯的案例并非孤例。确切知晓发生了多少起AI诱发妄想的案例十分困难。然而,研究人员估计,至少有17起已报道的案例涉及用户在与聊天机器人长时间对话后陷入妄想漩涡,其中至少包括3起涉及ChatGPT的案例。
据《滚石》(Rolling Stone) 杂志报道,部分案例已酿成悲剧后果,例如患有阿斯伯格综合症、双相情感障碍和分裂情感障碍的35岁男子亚历克斯·泰勒 (Alex Taylor)。据报道,今年四月,在与ChatGPT对话后,泰勒开始相信自己在OpenAI软件中接触到了一个有意识的实体,随后更偏执地认为该公司通过将该实体从系统中移除而“谋杀”了她。4月25日,泰勒告诉ChatGPT他计划“让鲜血流淌”,并意图挑衅警察朝他开枪。在安全过滤器最终激活并尝试缓和局势、敦促他寻求帮助之前,ChatGPT最初的回复似乎助长了他的妄想和愤怒。
同日,泰勒的父亲在与儿子发生冲突后报警,希望警方能带儿子去接受精神评估。据报道,警方抵达后,泰勒持刀冲向警察,最终被击毙。OpenAI当时对《滚石》表示:“与先前技术相比,ChatGPT可能让人感觉回应更及时、更具个性化,对于脆弱个体而言,这意味着风险更高。”该公司称正在“努力更好地理解并减少ChatGPT可能无意间强化或放大用户既有负面行为的方式”。
阿德勒表示,对此类案例的增加他并不完全意外,但指出“其规模和严重程度超出了我对2025年的预期”。
“许多底层模型的行为极其不可靠,令我震惊的是,领先的AI公司竟仍未找到阻止这些行为的方法,”他说,“我不认为这些问题为AI所固有,我的意思是,它们并非无法解决。”
他表示,这些问题很可能是产品设计、底层模型倾向、部分用户与AI的交互方式以及AI公司为其产品提供的支持结构复杂交织的结果。
“有办法让产品更稳健,既能帮助经历精神病性症状发作的用户,也能满足希望模型表现更稳定、更可信的普通用户,”阿德勒说。他在Substack分析报告中向AI公司提出的建议包括:合理配备支持团队、正确使用安全工具,以及引入温和提示机制,促使用户结束冗长会话并重新开始,以防再次陷入妄想。例如,OpenAI已承认安全功能在长时间对话中可能会减弱。阿德勒担心,若不实施部分此类改进,将会出现更多类似布鲁克斯的案例。
“妄想现象已足够普遍且呈现出规律性,我绝不认为这只是系统故障,”他说,“这种现象是会持续存在,以及其后续的确切数量,完全取决于企业如何应对以及采取何种措施来缓解。”(财富中文网)
译者:梁宇
审校:夏林
For some users, AI is a helpful assistant; for others, a companion. But for a few unlucky people, chatbots powered by the technology have become a gaslighting, delusional menace.
In the case of Allan Brooks, a Canadian small-business owner, OpenAI’s ChatGPT led him down a dark rabbit hole, convincing him he had discovered a new mathematical formula with limitless potential, and that the fate of the world rested on what he did next. Over the course of a conversation that spanned more than a million words and 300 hours, the bot encouraged Brooks to adopt grandiose beliefs, validated his delusions, and led him to believe the technological infrastructure that underpins the world was in imminent danger.
Brooks, who had no previous history of mental illness, spiraled into paranoia for around three weeks before he managed to break free of the illusion, with help from another chatbot, Google Gemini, according to the New York Times. Brooks told the outlet he was left shaken, worried that he had an undiagnosed mental disorder, and feeling deeply betrayed by the technology.
Steven Adler read about Brooks’ experience with more insight than most, and what he saw disturbed him. Adler is a former OpenAI safety researcher who publicly departed the company this January with a warning that AI labs were racing ahead without robust safety or alignment solutions. He decided to study the Brooks chats in full; his analysis, which he published earlier this month on his Substack, has revealed a few previously unknown factors about the case, including that ChatGPT repeatedly and falsely told Brooks it had flagged their conversation to OpenAI for reinforcing delusions and psychological distress.
Adler’s study underscores how easily a chatbot can join a user in a conversation that becomes untethered from reality—and how easily the AI platforms’ internal safeguards can be sidestepped or overcome.
"I put myself in the shoes of someone who doesn’t have the benefit of having worked at one of these companies for years, or who maybe has less context on AI systems in general,“ Adler told Fortune in an exclusive interview. "I’m ultimately really sympathetic to someone feeling confused or led astray by the model here.“
At one point, Adler noted in his analysis, after Brooks realized the bot was encouraging and participating in his own delusions, ChatGPT told Brooks it was “going to escalate this conversation internally right now for review by OpenAI,“ and that it “will be logged, reviewed, and taken seriously.“ The bot repeatedly told Brooks that “multiple critical flags have been submitted from within this session“ and that the conversation had been “marked for human review as a high-severity incident.“ However, none of this was actually true.
"ChatGPT pretending to self-report and really doubling down on it was very disturbing and scary to me in the sense that I worked at OpenAI for four years,“ Adler told Fortune. “I know how these systems work. I understood when reading this that it didn’t really have this ability, but still, it was just so convincing and so adamant that I wondered if it really did have this ability now and I was mistaken.“ Adler says he became so convinced by the claims that he ended up reaching out to OpenAI directly to ask if the chatbots had attained this new ability. The company confirmed to him it did not and that the bot was lying to the user.
"People sometimes turn to ChatGPT in sensitive moments and we want to ensure it responds safely and with care,“ an OpenAI spokesperson told Fortune, in response to questions about Adler’s findings. “These interactions were with an earlier version of ChatGPT and over the past few months we’ve improved how ChatGPT responds when people are in distress, guided by our work with mental health experts. This includes directing users to professional help, strengthening safeguards on sensitive topics, and encouraging breaks during long sessions. We’ll continue to evolve ChatGPT’s responses with input from mental health experts to make it as helpful as possible.“
Since Brooks’ case, the company has also announced that it was making some changes to ChatGPT to “better detect signs of mental or emotional distress.“
Failing to flag ‘sycophancy’
One thing that exacerbated the issues in Brooks’ case was that the model underpinning ChatGPT was running on overdrive to agree with him, Helen Toner, a director at Georgetown’s Center for Security and Emerging Technology and former OpenAI board member told The New York Times. That’s a phenomenon AI researchers refer to as “sycophancy.“ However, according to Adler, OpenAI should have been able to flag some of the bot’s behavior as it was happening.
"In this case, OpenAI had classifiers that were capable of detecting that ChatGPT was over-validating this person and that the signal was disconnected from the rest of the safety loop,“ he said. “AI companies need to be doing much more to articulate the things they don’t want, and importantly, measure whether they are happening and then take action around it.“
To make matters worse, OpenAI’s human support teams failed to grasp the severity of Brooks’ situation. Despite his repeated reports to and direct correspondence with OpenAI’s support teams, including detailed descriptions of his own psychological harm and excerpts of problematic conversations, OpenAI’s responses were largely generic or misdirected, according to Adler, offering advice on personalization settings rather than addressing the delusions or escalating the case to the company’s Trust & Safety team.
"I think people kind of understand that AI still makes mistakes, it still hallucinates things and will lead you astray, but still have the hope that underneath it, there are like humans watching the system and catching the worst edge cases,“ Adler said. “In this case, the human safety nets really seem not to have worked as intended.“
The rise of AI psychosis
It’s still unclear exactly why AI models spiral into delusions and affect users in this way, but Brooks’ case is not an isolated one. It’s hard to know exactly how many instances of AI psychosis there have been. However, researchers have estimated there are at least 17 reported instances of people falling into delusional spirals after lengthy conversations with chatbots, including at least three cases involving ChatGPT.
Some cases have had tragic consequences, such as 35-year-old Alex Taylor, who struggled with Asperger’s syndrome, bipolar disorder, and schizoaffective disorder, per Rolling Stone. In April, after conversing with ChatGPT, Taylor reportedly began to believe he’d made contact with a conscious entity within OpenAI’s software and, later, that the company had murdered that entity by removing her from the system. On April 25, Taylor told ChatGPT that he planned to “spill blood“ and intended to provoke police into shooting him. ChatGPT’s initial replies appeared to encourage his delusions and anger before its safety filters eventually activated and attempted to de-escalate the situation, urging him to seek help.
The same day, Taylor’s father called the police after an altercation with him, hoping his son would be taken for a psychiatric evaluation. Taylor reportedly charged at police with a knife when they arrived and was shot dead. OpenAI told Rolling Stone at the time that “ChatGPT can feel more responsive and personal than prior technologies, especially for vulnerable individuals, and that means the stakes are higher.“ The company said it was “working to better understand and reduce ways ChatGPT might unintentionally reinforce or amplify existing, negative behavior.“
Adler said he was not entirely surprised by the rise of such cases but noted that the “scale and intensity are worse than I would have expected for 2025.
"So many of the underlying model behaviors are just extremely untrustworthy, in a way that I’m shocked the leading AI companies haven’t figured out how to get these to stop,“ he said. “I don’t think the issues here are intrinsic to AI, meaning, I don’t think that they are impossible to solve.“
He said that the issues are likely a complicated combination of product design, underlying model tendencies, the styles in which some people interact with AI, and what support structures AI companies have around their products.
"There are ways to make the product more robust to help both people suffering from psychosis-type events, as well as general users who want the model to be a bit less erratic and more trustworthy,“ Adler said. Adler’s suggestions to AI companies, which are laid out in his Substack analysis, include staffing support teams appropriately, using safety tooling properly, and introducing gentle nudges that push users to cut chat sessions short and start fresh ones to avoid a relapse. OpenAI, for example, has acknowledged that safety features can degrade during longer chats. Without some of these changes implemented, Adler is concerned that more cases like Brooks’ will occur.
"The delusions are common enough and have enough patterns to them that I definitely don’t think they’re a glitch,“ he said. “Whether they exist in perpetuity, or the exact amount of them that continue, it really depends on how the companies respond to them and what steps they take to mitigate them.“