Our paper on NetSenseML: Network-Adaptive Compression for Efficient Distributed Machine Learning has been accepted at the 31st International European on Parallel and Distributed Computing (Euro-Par-2025). Abstract: Training large-scale distributed machine learning models imposes considerable demands on network infrastructure, often resulting in sudden traffic spikes that lead to congestion, increased latency, and reduced throughput, which would […]| Dirk Kutscher
How Amazon used the NVIDIA NeMo framework, GPUs and EFA from AWS to train some of its largest next-generation LLMs.| NVIDIA Blog