Time and memory efficient alternatives to vanilla transformers through locality sensitive hashing and reversible layers.| machine learning musings
Optimal Transport, the Sinkhorn Transformer, and Charmin Ultra-Soft| machine learning musings
Exploring 6 noteworthy approaches for incorporating longer-term context in transformer models.| machine learning musings