By compressing retrieved documents into efficient embeddings, REFRAG slashes latency and memory costs without modifying the LLM architecture or response quality. The post Meta’s REFRAG speeds up RAG systems by 30x without sacrificing quality first appeared on TechTalks.