Today, we're launching a new bug bounty program to stress-test our latest safety measures, in partnership with HackerOne. Similar to the program we announced last summer, we're challenging red-teamers to find universal jailbreaks in safety classifiers that we haven't yet deployed publicly.| www.anthropic.com
In this post, we are sharing what we have learned about the trajectory of potential national security risks from frontier AI models, along with some of our thoughts about challenges and best practices in evaluating these risks.| www.anthropic.com
In the decade that I have been working on AI, I’ve watched it grow from a tiny academic field to arguably the most important economic and geopolitical issue in the world. In all that time, perhaps the most important lesson I’ve learned is this: the progress of the underlying technology is inexorable, driven by forces too powerful to stop, but the way in which it happens—the order in which things are built, the applications we choose, and the details of how it is rolled out to society...| www.darioamodei.com
A paper from Anthropic describing a new way to guard LLMs against jailbreaking| www.anthropic.com
A refreshed, more powerful Claude 3.5 Sonnet, Claude 3.5 Haiku, and a new experimental AI capability: computer use.| www.anthropic.com