DeepSeek shook the market to start the week, sending AI heavyweight Nvidia down 17% on Monday, wiping out $600 billion in market cap, while other AI hardware names fell up to 30%. This is enough to make any investor panic, and it boiled down to one mission-critical question – did the model’s release fundamentally rewrite the AI capex story? The market’s readthrough is that Big Tech has now been overspending on AI. However, The I/O Fund believes that readthrough is wrong, it’s not that...| IO Fund
Efficient training of modern neural networks often relies on using lower precision data types. Peak float16 matrix multiplication and convolution performance is 16x faster than peak float32 performance on A100 GPUs. And since the float16 and bfloat16 data types are only half the size of float32 they can double the performance of bandwidth-bound kernels and reduce the memory required to train a network, allowing for larger models, larger batches, or larger inputs. Using a module like torch.am...| pytorch.org
GPUs accelerate machine learning operations by performing calculations in parallel. Many operations, especially those representable as matrix multipliers will see good acceleration right out of the box. Even better performance can be achieved by tweaking operation parameters to efficiently use GPU resources. The performance documents present the tips that we think are most widely useful.| NVIDIA Docs
Quantization is a technique used to compact LLMs. What methods exist and how to quickly start using them?| TensorOps