The past three years have seen significant interest in applying language models to the task of visual document understanding – integrating spatial, textual, and visual signals to make sense of PDFs and scanned documents.| machine learning musings
Like all ambitious papers, "Recurrent Independent Mechanisms" by Anirudh Goyal, Alex Lamb, Jordan Hoffmann, Shagun Sodhani, Sergey Levine, Yoshua Bengio, and Bernhard Schölkopf begins with an introduction that's one part motivation and one part philosophy. To motivate the Recurrent Independent Mechanisms (RIMs) model architecture,| machine learning musings
Time and memory efficient alternatives to vanilla transformers through locality sensitive hashing and reversible layers.| machine learning musings
Building intuition for Receiving Operator Characterstic (ROC) curves and what they measure through visualization.| machine learning musings
Musings on extensions to einsum notation for more readable machine learning code.| machine learning musings
A look at extending pre-trained representations with document retrieval to better solve downstream tasks.| machine learning musings
Augmenting transformer language models with sparse access of large memory matrices| machine learning musings
Leveraging the knowledge locked away in language models by reframing categorical tasks as constrained text generation.| machine learning musings
Optimal Transport, the Sinkhorn Transformer, and Charmin Ultra-Soft| machine learning musings
A foray into numeric precision reduction, operation fusion, pruning, knowledge distillation, and module replacement.| machine learning musings
Put on your headphones, jam out to some funky 80s rock and read about an equally funky variation on multi-head attention.| machine learning musings
A practical, code-first look at DeepMind's new haiku library.| machine learning musings
Leveraging annotator rationales for more interpretable and sample efficient classification.| machine learning musings
Put on your metaphorical safety goggles and start building something weird with JAX.| machine learning musings
Exploring 6 noteworthy approaches for incorporating longer-term context in transformer models.| machine learning musings