Archive of Giles Thomas’s blog posts from October 2025. Insights on AI, startups, and software development, plus occasional personal reflections.| www.gilesthomas.com
Starting training our LLM requires a loss function, which is called cross entropy loss. What is this and why does it work?| Giles' Blog
Muon is an optimizer for the hidden layers in neural networks. It is used in the current training speed records for both NanoGPT and CIFAR-10 speedrunning. Many empirical results using Muon have already been posted, so this writeup will focus mainly on Muon’s design. First we will define Muon and provide an overview of the empirical results it has achieved so far. Then we will discuss its design in full detail, including connections to prior research and our best understanding of why it works.| kellerjordan.github.io
Some privacy related extensions may cause issues on x.com. Please disable them and try again.| X (formerly Twitter)
Posts in the 'TIL deep dives' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
On sabbatical / created @PythonAnywhere.com, which found a home at @anacondainc.bsky.social / XP / Python / PSF Fellow / opinions my own / blog at https://www.gilesthomas.com| Bluesky Social
Posts in the 'LLM from scratch' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
Posts in the 'AI' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
Boost your learning: Test Yourself PDF Learn how to create, train, and tweak large language models (LLMs) by building one from the ground up! In Build a Large Language Model (from Scratch) bestselling author Sebastian Raschka guides you step by step through creating your own LLM. Each stage is explained with clear text, diagrams, and examples. You’ll go from the initial design and creation, to pretraining on a general corpus, and on to fine-tuning for specific tasks. Build a Large Language ...| Manning Publications