How Gzip and K-Nearest Neighbors Can Outperform Deep Learning Models| mindfulmodeler.substack.com
Introduction The scaling laws for neural language models showed that cross-entropy loss follows a power law in three factors: …| www.lesswrong.com