Triton provides an elegant solution to program GPU kernels in Python, positioning itself as a critical component in the modern AI software stack. To deliver performance and portability, it leverages a compiler, the capability of which determines the potential. Hacking the compiler internals is not a simple task. Here are some tips hopefully useful to folks. I’ll try to keep this blog post updated periodically.| Lei.Chat()
Time flies—almost 9 years have passed since I joined Google. Now the time has come for me to leave and move on. While here, I’m super lucky to mostly work on open source projects that I can publicly talk about. So at the end of my tenure with Google, I’d like to reflect and summarize the incredible journey, which I am super grateful for and thoroughly enjoyed, before I forget some details.| Lei.Chat()
Previous blog posts overviewed the MLIR dialect hierarchy for kernel code generation (CodeGen) and zoomed in on the Linalg and Vector dialects among them. Now I will switch to discuss the runtime side a bit, in order to provide a holistic view of MLIR-based machine learning (ML) compilers. This one touches the foundation and basics, including the target landscape, runtime requirements and designs to meet thereof.| Lei.Chat()
之前的文章介绍了 Vector dialect 及其相关 pattern。 今天我们来看一下上层的 Linalg dialect 以及相关的变换。| Lei.Chat()
I explained the Vector dialect and related patterns in the previous blog post. In this one let us look at a layer higher and talk about the Linalg dialect and transformations around it.| Lei.Chat()
The vector dialect and related transformations are crucial components in the MLIR CodeGen flow for machine learning (ML). Today I will zoom in on it to explain its positioning in the overall picture, characteristics, important operations and transformations, and best practices of using it based on my experiences.| Lei.Chat()
Vector dialect 及其相关变换 (transformation) 是机器学习代码生成流程中的重要一环。 今天我们来仔细分析一下其定位、设计、特性,并介绍其中的重要操作 (operation) 和变换, 最后用实例来说明如何恰当使用 vector dialect 相关功能。| Lei.Chat()
The initial blog post in this series captured my overall take on the evolution trends of compilers and IRs. It also touched on LLVM IR, SPIR-V, and MLIR, explaining the problems they are addressing and design focuses thereof. Today I will expand on MLIR and talk about its dialect hierarchy for machine learning (ML) compilers systematically.| Lei.Chat()
在这个系列的首篇文章中我分享了对编译器和中间表示 (IR) 演进趋势的整体理解, 也讨论了 LLVM IR, SPIR-V, 和 MLIR 所要解决的问题以及相应的设计着眼点。 今天对 MLIR 做进一步展开,分析一下机器学习相关的 dialect 体系。| Lei.Chat()
总体介绍编译器和中间表示 (LLVM IR, SPIR-V, and MLIR) 的发展历史和演进趋势| Lei.Chat()
Overall discussion on compilers and IRs (LLVM IR, SPIR-V, and MLIR): why they are in their current manner and how they would evolve| Lei.Chat()
This blog post talks about how to generate performant code for convolution ops using MLIR’s multiple levels of abstractions and transformations. I initially created it for targeting ARM Mali GPUs in IREE. But given it is just direct tiling and vectorization, it should be widely applicable. I will walk through the lowering steps, so if you are interested to know how to organize MLIR’s various dialects/patterns together to achieve similar tasks, this blog post might also be useful.| Lei.Chat()
Today I would like to describe one way to build a scalable and frictionless benchmarking pipeline for Android native libraries, aiming to support different benchmark and device variants. It is for open source projects, so it composes public services, commonly free under such conditions. The ingredients are cloud virtual machines for building, local single board computers (e.g., Raspberry Pi) for hosting Android devices and executing benchmarks, a Dana server for keeping track of benchmark res...| Lei.Chat()
Vulkan (compute) has the potential to be the next-generation GPGPU standard for various GPUs to support various domains; one immediate compelling application, is machine learning inference for resource-constrained scenarios like in mobile/edge devices and for gaming. This blog post explains the technical and business aspects behind and discusses the challenges and status.| Lei.Chat()
Unique challenges for edge/mobile ML inference, contrasting with training and inference in the cloud| Lei.Chat()
Hardware performance counter details in Adreno/Mali kernel mode drivers and how to query them from user mode application| Lei.Chat()
Internals and resources about GPU user mode and kernel mode drivers in Android/Linux| Lei.Chat()
Shader Toolchain (HLSL in Vulkan): talk, slides, downloads, and documentation| Lei.Chat()
HLSL for Vulkan: translating HLSL semantic strings into SPIR-V location numbers| Lei.Chat()
HLSL for Vulkan: translating HLSL matrices into SPIR-V| Lei.Chat()
HLSL for Vulkan: translating HLSL resources into SPIR-V| Lei.Chat()