Archive of Giles Thomas’s blog posts from August 2025. Insights on AI, startups, and software development, plus occasional personal reflections.| www.gilesthomas.com
I'm getting towards the end of chapter 4 of Sebastian Raschka's book "Build a Large Language Model (from Scratch)". When I first read this chapter, it seemed to be about tricks to use to make LLMs trainable, but having gone through it more closely, only the first part -- on layer normalisation -- seems to fit into that category. The second, about the feed-forward network is definitely not -- that's the part of the LLM that does a huge chunk of the thinking needed for next-token prediction. An...| Giles' blog
I've now finished chapter 4 in Sebastian Raschka's book "Build a Large Language Model (from Scratch)", having worked through shortcut connections in my last post. The remainder of the chapter doesn't introduce any new concepts -- instead, it shows how to put all of the code we've worked through so far into a full GPT-type LLM. You can see my code here, in the file gpt.py -- though I strongly recommend that if you're also working through the book, you type it in yourself -- I found that even t...| Giles' blog
How AI chatbots like ChatGPT work under the hood -- the post I wish I’d found before starting 'Build a Large Language Model (from Scratch)'.| Giles' Blog
Some privacy related extensions may cause issues on x.com. Please disable them and try again.| X (formerly Twitter)
I'm still working through chapter 4 of Sebastian Raschka's book "Build a Large Language Model (from Scratch)". This chapter not only puts together the pieces that the previous ones covered, but adds on a few extra steps. I'd previously been thinking of these steps as just useful engineering techniques ("folding, spindling and mutilating" the context vectors) to take a model that would work in theory, but not in practice, and make it something trainable and usable -- but in this post I'll expl...| Giles' blog
The feed-forward network in an LLM processes context vectors one at a time. This feels like it would cause similar issues to the old fixed-length bottleneck, even though it almost certainly does not.| Giles' Blog
Working through layer normalisation -- why do we do it, how does it work, and why doesn't it break everything?| Giles' Blog
Posts in the 'TIL deep dives' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
On sabbatical / created @PythonAnywhere.com, which found a home at @anacondainc.bsky.social / XP / Python / PSF Fellow / opinions my own / blog at https://www.gilesthomas.com| Bluesky Social
Posts in the 'LLM from scratch' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
Posts in the 'Musings' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
Posts in the 'AI' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
Learning in public helps me grow as an engineer and seems to benefit others too. Here's why I should do more.| Giles' Blog
Boost your learning: Test Yourself PDF Learn how to create, train, and tweak large language models (LLMs) by building one from the ground up! In Build a Large Language Model (from Scratch) bestselling author Sebastian Raschka guides you step by step through creating your own LLM. Each stage is explained with clear text, diagrams, and examples. You’ll go from the initial design and creation, to pretraining on a general corpus, and on to fine-tuning for specific tasks. Build a Large Language ...| Manning Publications