Login
From:
www.alignmentforum.org
(Uncensored)
subscribe
LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance — AI Alignment Forum
https://www.alignmentforum.org/posts/Phjqz3hjYDGoqGR65/llms-are-capable-of-misaligned-behavior-under-explicit-1
links
backlinks
Roast topics
Find topics
Find it!
Abstract In this paper, LLMs are tasked with completing an impossible quiz, while they are in a sandbox, monitored, told about these measures and ins…