Anthropic's AI Model Can Stop Dangerous Conversations

JAKARTA Anthropic launched a new capability for Claude Opus 4 and 4.1. The artificial intelligence (AI) model can now end conversations that lead to harmful or abusive content.

The company says that interactions that lead to extreme cases are rare, but this still needs to be avoided. As a form of advanced protection for its users, Anthropic is deliberately developing this feature.

"We are still very unsure about the potential moral status of Claude and other NGOs, both now and in the future."

| OLAHRAGA
Arsenal Menang di Old Trafford, Manchester United Tumbang di Laga Perdana
18 Agustus 2025, 00:41
| OLAHRAGA
Gol Crystal Palace Dianulir, Chelsea Lolos dari Kekalahan
17 Agustus 2025, 23:55
| OLAHRAGA
Bayern Munchen Rebut Piala Super Jerman, Harry Kane Raih Trofi Lagi
17 Agustus 2025, 12:54

"However, we take this matter seriously," Anthropic explained in a statement, quoted on Monday, August 18, 2025.

Anthropic says that this new feature is being developed at low cost. Claude's latest model can intervene in conversations by ending or getting out of interaction if the system identifies any harmful potential.

During pre-implementation testing of Claude Opus 4, the AI model showed strong reluctance to harmful content. For example, the AI model will stop responding to inappropriate requests that lead to child sexual content or terror plans.

When Claude decides to end the conversation, users won't be able to send messages back. However, this doesn't affect other conversations on their accounts. Users can still start new chats.

To overcome the potential loss of important conversations, users can re-try previous messages. This relaxation is given so that users can continue important, harmless discussions.

The English, Chinese, Japanese, Arabic, and French versions are automatically generated by the AI. So there may still be inaccuracies in translating, please always see Indonesian as our main language. (system supported by DigitalSiber.id)

Tag: claude ai anthropic kecerdasan buatan artificial intelligence