Anthropic has a new security system it says can stop almost all AI jailbreaks

Anthropic, a competitor of OpenAI, has introduced a new security measure known as “constitutional classifiers” to combat abusive prompts in artificial intelligence systems. The concept involves imbuing large language models with human-like values to prevent breaches. Testing on their Claude 3.5 Sonnet model showed an impressive 81.6% reduction in successful jailbreaks while minimally affecting performance. Despite criticism for seeking community help to challenge the system’s defenses, Anthropic emphasized the importance of its constitutional classifiers in thwarting jailbreak attempts. The company acknowledged the limitations of its testing system and the emergence of new open-source AI models like DeepSeek R1 from China, indicating a shifting landscape in AI technology.

Meta sets Oculus Quest headset shelf-life at six years, but there’s still hope that the Meta Quest 2 will survive past 2026

Thu, 13 Feb

0 0 votes

Article Rating