Gradient hacking is when a deceptively aligned AI deliberately acts to influence how the training process updates it. For example, it might try to be…| www.lesswrong.com