Our paper on NetSenseML: Network-Adaptive Compression for Efficient Distributed Machine Learning has been accepted at the 31st International European on Parallel and Distributed Computing (Euro-Par-2025). Abstract: Training large-scale distributed machine learning models imposes considerable demands on network infrastructure, often resulting in sudden traffic spikes that lead to congestion, increased latency, and reduced throughput, which would […]| Dirk Kutscher
Our paper on Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization has been accepted by the 9th Asia-Pacific Workshop on Networking (APNET'25). Abstract: Hybrid parallelism techniques are crucial for the efficient training of large language models (LLMs). However, these techniques often introduce differentiated computational and communication tasks across nodes. Existing automatic parallel planning frameworks […]| Dirk Kutscher