This is a sequence version of the paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper. Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, and Joar Skalse contributed equally to this sequence. The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such...| www.alignmentforum.org
Risks from Learned Optimization in Advanced ML Systems Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant This paper is available on arXiv, the AI Alignment Forum, and LessWrong. Abstract: We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a... Read more »| Machine Intelligence Research Institute