I'm continuing through chapter 5 of Sebastian Raschka's book "Build a Large Language Model (from Scratch)", which covers training the LLM. Last time I wrote about cross entropy loss. Before moving on to the next section, I wanted to post about something that the book only covers briefly in a sidebar: perplexity. Back in May, I thought I had understood it: Just as I was finishing this off, I found myself thinking that logits were interesting because you could take some measure of how certain t...| Giles' blog
Archive of Giles Thomas’s blog posts from October 2025. Insights on AI, startups, and software development, plus occasional personal reflections.| www.gilesthomas.com
Finally, we train an LLM! The final part of Chapter 5 of Build an LLM (from Scratch) runs the model on real text, then loads OpenAI’s GPT-2 weights for comparison.| Giles' Blog
I'm getting towards the end of chapter 4 of Sebastian Raschka's book "Build a Large Language Model (from Scratch)". When I first read this chapter, it seemed to be about tricks to use to make LLMs trainable, but having gone through it more closely, only the first part -- on layer normalisation -- seems to fit into that category. The second, about the feed-forward network is definitely not -- that's the part of the LLM that does a huge chunk of the thinking needed for next-token prediction. An...| Giles' blog
Some privacy related extensions may cause issues on x.com. Please disable them and try again.| X (formerly Twitter)
Posts in the 'TIL deep dives' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
After 14 years, it's time for me to move on -- but PythonAnywhere is in great hands and has a fantastic future ahead of it!| Giles' Blog
On sabbatical / created @PythonAnywhere.com, which found a home at @anacondainc.bsky.social / XP / Python / PSF Fellow / opinions my own / blog at https://www.gilesthomas.com| Bluesky Social
Posts in the 'LLM from scratch' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
Posts in the 'AI' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
Boost your learning: Test Yourself PDF Learn how to create, train, and tweak large language models (LLMs) by building one from the ground up! In Build a Large Language Model (from Scratch) bestselling author Sebastian Raschka guides you step by step through creating your own LLM. Each stage is explained with clear text, diagrams, and examples. You’ll go from the initial design and creation, to pretraining on a general corpus, and on to fine-tuning for specific tasks. Build a Large Language ...| Manning Publications