We love TPUs at Google, but GPUs are great too. This chapter takes a deep dive into the world of NVIDIA GPUs – how each chip works, how they’re networked together, and what that means for LLMs, especially compared to TPUs. This section builds on Chapter 2 and Chapter 5, so you are encouraged to read them first.| jax-ml.github.io
Setup InfiniBand with a single SkyPilot configuration#| docs.skypilot.co
Kubernetes is the de facto standard for container orchestration, but when it comes to handling specialized hardware like GPUs and other accelerators, things get a bit complicated. This blog post dives into the challenges of managing failure modes when operating pods with devices in Kubernetes, based on insights from Sergey Kanzhelev and Mrunal Patel's talk at KubeCon NA 2024. You can follow the links to slides and recording. The AI/ML boom and its impact on Kubernetes The rise of AI/ML worklo...| Kubernetes
We investigate four constraints to scaling AI training: power, chip manufacturing, data, and latency. We predict 2e29 FLOP runs will be feasible by 2030.| Epoch AI
Everything you want to know about the new H100 GPU.| NVIDIA Technical Blog
The Open-source Tool Stack to build, scale, test, deploy, and monitor LLMs in 2024.| www.blog.aiport.tech
Get higher performance with a set of GPU-accelerated libraries, tools, and technologies.| NVIDIA Developer
A single software company can spend over 💲10 Billion/year, on data centres, but not every year is the same. When all stars align, we see bursts of new technologies reaching the market simultaneously, thus restarting the purchasing super-cycle. 2022 will be just that, so let’s jump a couple of quarters ahead and see what’s on the shopping list of your favorite hyperscaler! Friendly warning: this article is full of technical terms and jargon, so it may be hard to read if you don’t writ...| ashvardanian.com
PhD student at University of Texas at Austin 🤘. Doing systems for ML.| www.bodunhu.com
cuDNN provides researchers and developers with high-performance GPU acceleration.| NVIDIA Developer