Introduction The scaling laws for neural language models showed that cross-entropy loss follows a power law in three factors: …| www.lesswrong.com