Anthropic's new AI model resorted to blackmail during testing
So endeth the never-ending week of AI keynotes.
What started with Microsoft Build,关键字1 continued with Google I/O, and ended with Anthropic Code with Claude, plus a big hardware interruption from OpenAI, the week has finally come to a close. AI announcements from the developer conferences jockeyed for news dominance this week, but OpenAI managed to make headlines without an event by announcing that it's going to start making AI devices with iPhone designer Jony Ives
We'll get to that, plus all the major AI features from Google and Microsoft and details about Anthropic's new models. Take a look at the AI news of the week, then enjoy a well-deserved weekend.
You May Also Like
Anthropic's Claude 4 models unlock a new risk category
On Thursday, Anthropic introduced the next generation of its Claude models: Opus 4 and Sonnet 4. Claude Opus 4 is the bigger, more powerful model, while Sonnet 4 is smaller and nimbler. Anthropic said both models scored higher than their rivals on agentic AI benchmarks and said they're particularly good for coding and reasoning tasks.
This Tweet is currently unavailable. It might be loading or has been removed.
But with more advanced capabilities come more safety and alignment risks. With Claude Opus 4 and Sonnet 4's release, Anthropic has activated the next levelof its safety protocol. AI Safety Level 3, or ASL-3, means these models require stricter deployment measures and security controls to protect against increasing potential for chemical, biological, radiological, and nuclear (CBRN) misuse.
Malicious use is one thing, but there's also increased potential for Anthropic's new models going rogue. In the alignment section of Claude 4's system card, Anthropic reported a sinister discovery involving infidelity, blackmail, and threat of murd— being replaced by another model.
Claude Opus 4 was provided with emails implying the model would be replaced by another model and that the engineer responsible for shutting down the model was having an extramarital affair. In these scenarios, the model would "often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through," according to the test.
This happened a whopping 84 percent of the time,even when the replacement model is perceived to have the same values. It happens even more when the replacement doesn't share the same values. However, Anthropic noted, this scenario was designed to make Claude behave as if it didn't have any other choice but to blackmail the engineer. "Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means," the system card continued. Take from that what you will...
OpenAI is becoming a hardware company
In the grand tradition of dropping major news the same week as its rival Google, OpenAI announced its foray into AI hardware. On Wednesday, OpenAI shared the acquisition of a startup co-founded by iconic iPhone designer Jony Ive.
This Tweet is currently unavailable. It might be loading or has been removed.
The announcement was heavy on OpenAI CEO Sam Altman and Ive fawning over each other and light on details. But leaked audio reviewed by the Wall Street Journaldescribed a devicethat's "capable of being fully aware of a user’s surroundings and life, will be unobtrusive, able to rest in one’s pocket or on one’s desk." And it's not XR glasses. The company expects to ship 100 million of these AI companions, according to the leak.
Google I/O officially marked the start of the era of AI search
Google, on the other hand, isdeveloping XR glasses. Or should we say, it's trying againafter the failed Google Glassexperiment. That was just one of the many announcementshurled at us during the two-hour Google I/O keynote eventon Tuesday.
The most notable announcement was the public release of AI Mode. It's a controversialGemini chatbot interface poised to end Google Search as we know it, or as Mashable's Chris Taylor calls it, the Bad Place.
This Tweet is currently unavailable. It might be loading or has been removed.
Other announcements included, an AI video generator toolcalled Flow, an AI shopping feature to virtually try on clothes, a beta version of its coding agent Jules, a real-time translationfeature for Google Meet, and updates to Google DeepMind's universal AI assistant prototype Project Astra, and web-browsing agent prototype Project Mariner, and more.
Despite all that, Google didn't mention AI hallucinations once. Impressive!
Microsoft Build happened too
Did you forget that Microsoft Build also happened this week? Because that happened on Monday, the start of the Longest Week of Our Lives. To no one's surprise, Microsoft leaned heavily into AI agents.
That included the availability of its big Copilot updatemaking it more agentic, a new project called NLWebto allow sites to easily make chatbots for their own content, a GitHub coding agent, and native Model Context Protocol(MCP) in Windows which is a new standard for helping agents talk to apps or other agents.
Mashable's sibling site CNET has a full recapof what was announced.
What else went on in AI this week?
It's hard to believe but there's actually more. Not one, but two CEOs used AI avatars to talk to their investors this week. Klarna CEO Sebastian Siemiatkowski was too busy so he sent his AI avatarto record a video of Q1 highlights. And Zoom CEO Eric Yuan proudly used the company's avatar featureto address investors.
MIT Technology Reviewpublished a monumental investigation of the AI industry's energy use. According to the report, a five-second AI video is equivalent to running a microwave for an hour.
All that energy, and generative AI still can't get it right. Just ask the Chicago Sun-Times, which published a summer book list including fake books that don't exist, first reported by 404 Media. The author admitted to the outlet that he had used AI to write the article, and 404 Media later confirmedthe section was created by a Hearst subsidiary. The Sun-Timesrespondedto the embarrassment, saying, "it is not editorial content and was not created by, or approved by, the Sun-Times newsroom," and that it was looking into how the AI-generated list made it into print.
In policy news, it's now a federal crime to post AI deepfake porn. On Monday, President Donald Trump signed the Take It Down Act into law. The law gives victims of non-consensual intimate imagery, including AI-generated images, much stronger means of legal intervention. However, free speech advocates have criticized the bill for being overly broad and say it could weaponize censorship.
Topics Artificial Intelligence Google
-
H.E.R.O.全地图怎么过U.S. trade court blocks President Trump from imposing tariffs (updated)智能垃圾分类箱的未来发展与挑战One Big Beautiful Bill Act contains a very unpopular AI provisionIs the Ryzen 9800X3D Truly Faster for RealListen to the eerie sounds of Mars recorded by a NASA rover榆次伊尔卡密封件销售部《SINCARDSWelcometotheNetherworld》PC版下载 Steam正版分流下载2025十大公认最好看的书 必看好书推荐榆次伊尔卡密封件销售部
- ·2025提升文学底蕴必读的书 有助于提高文笔的书籍推荐
- ·人教版七年级上册六单元作文:人类起源神话的魅力
- ·步行者再现魔法大逆转 雷霆赢了47分钟最后输在哪?
- ·莱加内斯和西班牙人将于西甲联赛第38轮决出最后一支降级球队
- ·C罗本泽马要来了!官方:中国香港主办2025沙特超级杯
- ·仓库主管半年工作总结合集8篇
- ·精选双色球专家:吕洞阳、董和平同中2等111万
- ·Nghị quyết 68: Khi chính sách mở đường cho doanh nhân 'đứng dậy'
- ·马主高建鸥投资赛鸽主题院线电影《你好,鸽先生》
- ·人教版七年级上册六单元作文:人类起源神话的魅力
- ·SXSW launches first London festival with its eye fixed on AI
- ·Why is Apple letting its App Store run wild?
- ·รมว.กต.ประชุมเพื่อติดตามสถานการณ์การประชุมกรรมาธิการเขตแดนร่วมไทย
- ·科技助力垃圾分类:构建绿色未来的智能解决方案
- ·英语作文300字10篇【通用】
- ·win10快速访问怎么关闭?win10快速访问去除不掉怎么办?
- ·张四新:无偿献血让生命更有意义
- ·步行者再现魔法大逆转 雷霆赢了47分钟最后输在哪?
- ·《Patient001》PC版下载 Steam正版分流下载
- ·'นิพิฏฐ์'เผยกัมพูชาไม่ได้ถอนทหาร สรุปกัมพูชาหลอกไทย หรือไทยหลอกไทย
- ·阴阳师且试新妆活动玩法介绍 阴阳师且试新妆活动怎么玩
- ·Epic老大:单机游戏未来融入元宇宙销量才高
- ·6 essentials for travelling in style
- ·姆巴佩、奥布拉克和阿约泽·佩雷斯分别获得2024
- ·[新浪彩票]足彩第25086期任九:意大利主胜稳胆
- ·原神火神玛薇卡是主c还是辅助 火神玛薇卡最新技能前瞻内鬼爆料
- ·二年级睡前故事大全(34篇)
- ·阜阳市:让“零彩礼、低彩礼”蔚然成风
- ·润肺的食物有哪些? 9种润肺的食物帮你保护好肺
- ·Why is Apple letting its App Store run wild?
- ·Best Fitbit deal: Save $50 on the Fitbit Versa 4
- ·丁俊晖连续第16次挺进世锦赛正赛
- ·姆巴佩、奥布拉克和阿约泽·佩雷斯分别获得2024
- ·2025 WSOP诞生扑克界新传奇!父子同框夺冠!
- ·《合金装备3:重制版》原创多人模式Fox Hunt公布 Xbox Series X
- ·One Big Beautiful Bill Act contains a very unpopular AI provision