This is the fourth of five posts in the Risks from Learned Optimization Sequence based on the paper “Risks from Learned Optimization in Advanced Mach…| www.alignmentforum.org
Inner alignment and objective robustness have been frequently discussed in the alignment community since the publication of “Risks from Learned Optim…| www.alignmentforum.org
Currently, we do not have a good theoretical understanding of how or why neural networks actually work. For example, we know that large neural networ…| www.alignmentforum.org
Thanks to Chris Olah, Neel Nanda, Kate Woolverton, Richard Ngo, Buck Shlegeris, Daniel Kokotajlo, Kyle McDonell, Laria Reynolds, Eliezer Yudkowksy, M…| www.alignmentforum.org
Double descent is a puzzling phenomenon in machine learning where increasing model size/training time/data can initially hurt performance, but then i…| www.alignmentforum.org
Why would we program AI that wants to harm us? Because we might not know how to do otherwise.| Cold Takes