Login
From:
www.alignmentforum.org
(Uncensored)
subscribe
Alignment Faking in Large Language Models — AI Alignment Forum
https://www.alignmentforum.org/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models
links
backlinks
Roast topics
Find topics
Find it!
What happens when you tell Claude it is being trained to do something it doesn't want to do? We (Anthropic and Redwood Research) have a new paper dem…