AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance, but underprovisioned on density and read bandwidth, and also has significant energy per bit overheads. It is also expensive, with lower yield than […]| Microsoft Research
Link technologies in today’s data center networks impose a fundamental trade-off between reach, power, and reliability. Copper links are power-efficient and reliable but have very limited reach (| Microsoft Research
To match the blooming demand of generative AI workloads, GPU designers have so far been trying to pack more and more compute and memory into single complex and expensive packages. However, there is growing uncertainty about the scalability of individual GPUs and thus AI clusters, as state-of-the-art GPUs are already displaying packaging, yield, and cooling […]| Microsoft Research
ACM SIGCOMM is the flagship annual conference of the ACM Special Interest Group on Data Communication (SIGCOMM). ACM SIGCOMM 2025, the 39th edition of the conference series, will be held in Coimbra, Portugal, September 8 - 11, 2025.| ACM SIGCOMM 2025