Andrej Karpathy's 2015 blog post 'The Unreasonable Effectiveness of Recurrent Neural Networks' went viral in its day, for good reason. How does it read ten years later?| Giles' Blog
Archive of Giles Thomas’s blog posts from October 2025. Insights on AI, startups, and software development, plus occasional personal reflections.| www.gilesthomas.com
Finally, we train an LLM! The final part of Chapter 5 of Build an LLM (from Scratch) runs the model on real text, then loads OpenAI’s GPT-2 weights for comparison.| Giles' Blog
Starting training our LLM requires a loss function, which is called cross entropy loss. What is this and why does it work?| Giles' Blog
Archive of Giles Thomas’s blog posts from August 2025. Insights on AI, startups, and software development, plus occasional personal reflections.| www.gilesthomas.com
The feed-forward network is one of the easiest parts of an LLM in terms of implementation -- but when I thought about it I realised it was one of the most important.| Giles' Blog
Working through layer normalisation -- why do we do it, how does it work, and why doesn't it break everything?| Giles' Blog
Posts in the 'TIL deep dives' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
The way we get from context vectors to next-word prediction turns out to be simpler than I imagined -- but understanding why it works took a bit of thought.| Giles' Blog
A pause to take stock: realising that attention heads are simpler than I thought explained why we do the calculations we do.| Giles' Blog
Finally getting to the end of chapter 3 of Raschka’s LLM book! This time it’s multi-head attention: what it is, how it works, and why the code does what it does.| Giles' Blog
Batching speeds up training and inference, but for LLMs we can't just use matrices for it -- we need higher-order tensors.| Giles' Blog
Adding dropout to the LLM's training is pretty simple, though it does raise one interesting question| Giles' Blog
The essential matrix operations needed for neural networks. For ML beginners.| Giles' Blog
How we actually do matrix operations for neural networks in frameworks like PyTorch. For ML beginners.| Giles' Blog