PyTorch 2.8 has just been released with a set of exciting new features, including a limited stable libtorch ABI for third-party C++/CUDA extensions, high-performance quantized LLM inference on Intel CPUs with native PyTorch, experimental Wheel Variant Support, inductor CUTLASS backend support, etc. Among all these features, one of the great things is that PyTorch can now provide competitive Large Language Model (LLM) low-precision performance on Intel Xeon platform as compared with other popu...