AI reasoning, inference and networking will be top of mind for attendees of next week’s Hot Chips conference. A key forum for processor and system architects from industry and academia, Hot Chips — running Aug. 24-26 at Stanford University — showcases the latest innovations poised to advance AI factories and drive revenue for the trillion-dollar Read Article| NVIDIA Blog
Essential Constants for Numerical Algorithms and Scientific Computations| Lei Mao's Log Book
Rust CUDA enables you to write and run CUDA| Rust GPU Blog
Deriving Inverse Layout Mathematically| Lei Mao's Log Book
Creating Tiled Layouts Using Blocked Product and Raked Product| Lei Mao's Log Book
By setting a clear, stable standard, the RVA23 profile’s ratification is spurring top vendors to align on a common RISC-V hardware goal. All we need now is that hardware. By...| RISC-V International
Elucidating CuTe Inner Partition and Local Tile| Lei Mao's Log Book
Elucidating CuTe Outer Partition and Local Partition| Lei Mao's Log Book
CUDA Memory Load/Store Performance: A Comprehensive Benchmark Analysis| Chris Choy
Inverse Layout Function| Lei Mao's Log Book
Dynamically Loading CUDA Kernels| Lei Mao's Log Book
Avoiding CUDA Shared Memory Bank Conflicts| Lei Mao's Log Book
Rust CUDA enables you to write and run CUDA| Rust GPU Blog
In the latest round of MLPerf Training, the NVIDIA AI platform delivered the highest performance at scale on every benchmark.| NVIDIA Blog
brought to you by the ITS Research team at QMUL| blog.hpc.qmul.ac.uk
GPU-accelerated computing is one of the most transformative trends in modeling and simulation today. All of the major engineering software providers are supporting GPU architectures...| Rescale
I am somehow very late to learning CUDA. I didn’t even know until recently that CUDA is just C++ with a small amount of extra stuff. If I had known that there is so little friction to learning it, I would have checked it out much earlier. But if you come in with C++ habits, […]| Probably Dance
The Rust CUDA project has been [rebooted after| Rust GPU Blog
We're excited to announce the reboot of the Rust CUDA project. Rust CUDA enables you to write and run CUDA kernels in Rust, executing directly on NVIDIA GPUs using NVVM IR.| rust-gpu.github.io
Learning CUDA by optimizing matrix-vector multiplication (SGEMV) for cuBLAS-like performance| Maharshi's blog
Unification of Memory on the Grace Hopper Nodes The delivery of new GPUs for research is continuing, most notable is the newIsambard-AI cluster atBristol. As new cutting-edge GPUs are released, software engineers aretasked with being made aware of the new architectures and features these newGPUs offer. The new Grace-Hopper GH200 nodes, as announced in a previous blogpost, consist of a 72-core NVIDIA Grace CPU and anH100 Tensor Core GPU. One of the key innovations is the NVIDIA NVLinkChip-2-Ch...| QMUL ITS Research Blog
LightGBM のバージョン 4.0.0 が 2023-07-14 にリリースされた。 このリリースは久しぶりのメジャーアップデートで、様々な改良が含まれている。 詳細については、以下のリリースノートで確認できる。 github.com リリースの大きな目玉として CUDA を使った学習の実装が全面的に書き直されたことが挙げられる。 以前の LightGBM は、GPU を学習に使う場合でも、その計算リソース...| CUBE SUGAR CONTAINER
Fixing the pytorch unknown CUDA error.| The Cloistered Monkey
NAMD is a molecular dynamics program that can use GPU acceleration to speed up its calculations. Recent OpenPOWER machines like the IBM Power Systems S822LC for High Performance Computing (Minsky) come with a new interconnect for GPUs called NVLink, which offers extremely high bandwidth to a number of very powerful Nvidia Pascal P100 GPUs. So they're ideal machines for this sort of workload.| sthbrx.github.io
Our previous blogs (Taichi & PyTorch 01 and 02) pointed out that Taichi and Torch serve different application scenarios can they complement each other? And the answer is an unequivocal yes! In this blog, we will use two simple examples to explain how to use Taichi kernel to implement data preprocessing operators or custom ML operators. With Taichi, you can accelerate your ML model development with ease and get rid of the tedious low-level parallel programming (CUDA for example) for good.| docs.taichi-lang.org