Announcing Matryoshka (dimension flexibility) and binary quantization in Vespa and how these features slashes costs.| Vespa Blog
Vector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression 🤯.| Evan Schwartz
Exploring memory-efficient techniques for LLMs| newsletter.maartengrootendorst.com