Anthropic has a new security system it says can stop almost all AI jailbreaks

Posted by:
James Thompson
Thu, 13 Feb
0 Comment
Feature image

Anthropic, a competitor of OpenAI, has introduced a new security measure known as “constitutional classifiers” to combat abusive prompts in artificial intelligence systems. The concept involves imbuing large language models with human-like values to prevent breaches. Testing on their Claude 3.5 Sonnet model showed an impressive 81.6% reduction in successful jailbreaks while minimally affecting performance. Despite criticism for seeking community help to challenge the system’s defenses, Anthropic emphasized the importance of its constitutional classifiers in thwarting jailbreak attempts. The company acknowledged the limitations of its testing system and the emergence of new open-source AI models like DeepSeek R1 from China, indicating a shifting landscape in AI technology.

Tags:

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments