In this blog post, we introduce STX-B0T, our human-responsive social robot prototype powered by AMD Ryzen AI’s CVMLSDK, the Strix-Halo (STX-Halo) APU, and built with open-source hardware.| AMD ROCm Blogs
Error parsing meta tag attribute “keywords”: No content.| AMD ROCm Blogs
The PyTorch ecosystem is a vibrant and expansive collection of tools, libraries, and community-driven projects that enhance and extend the core PyTorch framework. It empowers researchers and developers to build, train, and deploy deep learning models across a wide range of domains with flexibility and efficiency.| AMD ROCm Blogs
ROCm Core SDK 7.9.0 is a technology preview release built by TheRock. The Core SDK is a slimmed down version of the traditional ROCm release, focusing on foundational components for running high performance AI workloads on AMD GPUs. TheRock introduces a streamlined uniform system for building and testing these core components. TheRock manages dependencies between packages, so changes affecting several ROCm components can be coordinated automatically. Python releases built using TheRock can be...| AMD ROCm Blogs
Learn how to boost AI inference performance with the Kimi-K2-Instruct model on AMD Instinct MI355 Series GPUs. This blog highlights benchmark results against B200 GPUs, focusing on faster time to first token (TTFT), lower latency, and higher throughput. You’ll also see how MI355X Series GPUs excel in high-concurrency workloads thanks to their larger memory capacity. By the end you’ll know how to evaluate and deploy MI355X GPUs with SGLang to scale demanding applications efficiently.| AMD ROCm Blogs
Speculative decoding has emerged as a promising approach to accelerate large language model (LLM) inference, yet existing methods face a tradeoff: parallel designs achieve higher speed but lose accuracy, while serial designs gain accuracy at the cost of efficiency. In our recent paper Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding, we introduce a new paradigm that addresses this bottleneck by prioritizing accuracy on the earliest draft tokens, which matters m...| AMD ROCm Blogs
This post continues from Part 1 where we introduced GEMM tuning concepts in hipBLASLt and explored the basics of solution search. In Part 2, we focus on offline tuning with the hipblaslt-bench tool. This workflow allows developers to benchmark candidate GEMM kernels for specific problem shapes, capture the best-performing solutions, and reuse them at runtime without rebuilding or modifying the hipBLASLt library.| AMD ROCm Blogs
This blog is part of a series of walkthroughs of Life Science AI models, stemming from this article.| AMD ROCm Blogs
Learn how to use Medical Open Network for Artificial Intelligence (MONAI) 1.0 on ROCm, with examples and demonstrations.| ROCm Blogs
Haohui Mai is affiliated with the company CausalFlow.ai.| AMD ROCm Blogs
This blog will highlight AMD ROCm’s ability to power next-generation audio-to-video models with simple, reproducible workflows.| ROCm Blogs
In this blog, we will provide step by step instruction on how to reproduce AMD's MLPerf Inference v5.1 Submission| ROCm Blogs
Enabling ROCm support for FastVideo inference using TeaCache on AMD Instinct GPUs, accelerating video generation with optimized backends| ROCm Blogs
We present AMD Hummingbird, offering a two-stage distillation framework for efficient, high-quality text-to-video generation using compact models.| ROCm Blogs
Fine-tune Llama 3.2 Vision models on AMD MI300X GPU using Torchtune, achieving 2.3× better accuracy with 11B vs 90B model on chart-based tasks.| ROCm Blogs
A step-by-step guide to reproducing AMD’s MLPerf v5.0 results for Llama 2 70B & SDXL using ROCm on MI325X| ROCm Blogs
AMD is excited to announce Instella, a family of fully open state-of-the-art 3-billion-parameter language models (LMs). , In this blog we explain how the Instella models were trained, and how to access them.| ROCm Blogs
AMD's GPU training optimizations deliver peak performance for advanced AI models through ROCm software stack.| ROCm Blogs
Learn how to optimize large language model inference using vLLM on AMD's MI300X GPUs for enhanced performance and efficiency.| ROCm Blogs
This post, the second in a series, provides a walkthrough for building a vLLM container that can be used for both inference and benchmarking.| ROCm Blogs
Featured Posts| rocm.blogs.amd.com
In this post we present a brief preview of AMD's Next-Gen Fortran Compiler, our new open source Fortran complier optimized for AMD GPUs using OpenMP offloading, offering direct interface to ROCm and HIP.| ROCm Blogs