Learned embeddings often suffer from ’embedding collapse’, where they occupy only a small subspace of the available dimensions. This article explores the causes of embedding collapse, from two-tower models to GNN-based systems, and its impact on model scalability and recommendation quality. We discuss methods to detect collapse and examine recent solutions proposed by research teams at Visa, Facebook AI, and Tencent Ads to address this challenge.| Sumit's Diary
The Mixture-of-Experts (MoE) is a classical ensemble learning technique originally proposed by Jacobs et. al1 in 1991. MoEs have the capability to substantially scale up the model capacity and only introduce small computation overhead. This ability combined with recent innovations in the deep learning domain has led to the wide-scale adoption of MoEs in healthcare, finance, pattern recognition, etc. They have been successfully utilized in large-scale applications such as Large Language Modeli...| Sumit's Diary