I want to show you something.| registerspill.thorstenball.com
We examine an LLM jailbreaking technique called "Deceptive Delight," a technique that mixes harmful topics with benign ones to trick AIs, with a high success rate. We examine an LLM jailbreaking technique called "Deceptive Delight," a technique that mixes harmful topics with benign ones to trick AIs, with a high success rate.| Unit 42