One term that I have been hearing a lot lately is reward hacking. I have heard this term multiple times from folks at OpenAI and Anthropic, and it represents a fundamental challenge in AI alignment and reliability. What is Reward Hacking? Reward hacking, also known as specification gaming, occurs when an AI optimizes an objective … Continue reading "Reward Hacking"