Key takeaways from livestreaming DeepSeek R-1 671B (4-bit) on a 14x RTX 3090 basement AI server. See how KTransformers crushed llama.cpp in prompt eval speeds, compare setups, and get real-world insights into massive LLM inference with vLLM, ExLlamaV2, and more.