© Michiel Stock. Last modified: June 17, 2025. Website built with Franklin.jl and the Julia programming language.| michielstock.github.io
Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention - GitHub - lucidrains/sinkhorn-transformer at pragmatic.ml| GitHub
We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our method is based on differentiable sorting of internal representations. Concretely, we introduce a meta sorting network that learns to generate latent permutations over sequences. Given sorted sequences, we are then able to compute quasi-global attention with only local windows, improving the memory efficiency of the attention module. To this end, we propose new algorithmic innovations such as C...| arXiv.org
Distance function defined between probability distributions| en.wikipedia.org