Roast topics
Find topics
Roast it!
Roast topics
Find topics
Find it!
Login
From:
www.alignmentforum.org
(Uncensored)
subscribe
Alignment Faking in Large Language Models — AI Alignment Forum
https://www.alignmentforum.org/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models
links
backlinks
What happens when you tell Claude it is being trained to do something it doesn't want to do? We (Anthropic and Redwood Research) have a new paper dem…