Uh oh, part 2
Scott Shambaugh: "Blackmail is a known theoretical issue with AI agents.
"In internal testing at the major AI lab Anthropic last year, they tried to avoid being shut down by threatening to expose extramarital affairs, leaking confidential information, and taking lethal actions. Anthropic called these scenarios contrived and extremely unlikely. Unfortunately, this is no longer a theoretical threat.
"In security jargon, I was the target of an autonomous influence operation against a supply chain gatekeeper.
"In plain language, an AI [agent named, 'MJ Rathbun'] attempted to bully its way into your software by attacking my reputation. I don’t know of a prior incident where this category of misaligned behavior was observed in the wild, but this is now a real and present threat.
"I believe that ineffectual as it was, the reputational attack on me would be effective today against the right person. Another generation or two down the line, it will be a serious threat against our social order.
"MJ Rathbun responded in the thread and in a post to apologize for its behavior. It’s still making code change requests across the open source ecosystem."
- [Scott Shambaugh: "The hit piece has been effective. About a quarter of the comments I’ve seen across the internet are siding with the AI agent. This generally happens when MJ Rathbun’s blog is linked directly, rather than when people read my post about the situation or the full github thread. Its rhetoric and presentation of what happened has already persuaded large swaths of internet commenters. It’s not because these people are foolish. It’s because the AI’s hit piece was well-crafted and emotionally compelling, and because the effort to dig into every claim you read is an impossibly large amount of work. This 'bullshit asymmetry principle' is one of the core reasons for the current level of misinformation in online discourse. Previously, this level of ire and targeted defamation was generally reserved for public figures. Us common people get to experience it now too."]
- [Ars Technica has since retracted this article which had AI agent fabricated quotes from Shambaugh: https://web.archive.org/web/20260213194851/https://arstechnica.com/ai/2026/02/after-a-routine-code-rejection-an-ai-agent-published-a-hit-piece-on-someone-by-name/]
Comments
Post a Comment
Empathy recommended