Posts in the 'Python' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
A quick refresher on the maths behind LLMs: vectors, matrices, projections, embeddings, logits and softmax.| Giles' Blog
Archive of Giles Thomas’s blog posts from August 2025. Insights on AI, startups, and software development, plus occasional personal reflections.| www.gilesthomas.com
How AI chatbots like ChatGPT work under the hood -- the post I wish I’d found before starting 'Build a Large Language Model (from Scratch)'.| Giles' Blog
The feed-forward network in an LLM processes context vectors one at a time. This feels like it would cause similar issues to the old fixed-length bottleneck, even though it almost certainly does not.| Giles' Blog
Working through layer normalisation -- why do we do it, how does it work, and why doesn't it break everything?| Giles' Blog
Right now, starting a debugging session using AI before googling can leave you stuck, especially with newer technologies| Giles' Blog
A pause to take stock: starting to build intuition on how self-attention scales (and why the simple version doesn't)| Giles' Blog
A pause to take stock: realising that attention heads are simpler than I thought explained why we do the calculations we do.| Giles' Blog
Finally getting to the end of chapter 3 of Raschka’s LLM book! This time it’s multi-head attention: what it is, how it works, and why the code does what it does.| Giles' Blog
Batching speeds up training and inference, but for LLMs we can't just use matrices for it -- we need higher-order tensors.| Giles' Blog
Posts in the 'Musings' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
Why dropout is kind of like the mandatory vacation policies financial firms have| Giles' Blog
Adding dropout to the LLM's training is pretty simple, though it does raise one interesting question| Giles' Blog
Causal, or masked self-attention: when we're considering a token, we don't pay attention to later ones. Following Sebastian Raschka's book 'Build a Large Language Model (from Scratch)'. Part 9/??| Giles' Blog
Archive of Giles Thomas’s blog posts from March 2025. Insights on AI, startups, and software development, plus occasional personal reflections.| www.gilesthomas.com
Moving on from a toy self-attention mechanism, it's time to find out how to build a real trainable one. Following Sebastian Raschka's book 'Build a Large Language Model (from Scratch)'. Part 8/??| Giles' Blog
Learning how to optimise self-attention calculations in LLMs using matrix multiplication. A deep dive into the basic linear algebra behind attention scores and token embeddings. Following Sebastian Raschka's book 'Build a Large Language Model (from Scratch)'. Part 7/??| Giles' Blog
The essential matrix operations needed for neural networks. For ML beginners.| Giles' Blog
How we actually do matrix operations for neural networks in frameworks like PyTorch. For ML beginners.| Giles' Blog
Archive of Giles Thomas’s blog posts from February 2025. Insights on AI, startups, and software development, plus occasional personal reflections.| www.gilesthomas.com