Giles Thomas's blog: Practical insights on AI, startups, and software development, drawn from 30 years of building technology and 20 years of blogging.| www.gilesthomas.com
What actually goes on inside an LLM to make it calculate probabilities for the next token?| Giles' Blog
Archive of Giles Thomas’s blog posts from September 2025. Insights on AI, startups, and software development, plus occasional personal reflections.| www.gilesthomas.com
Working through layer normalisation -- why do we do it, how does it work, and why doesn't it break everything?| Giles' Blog
Posts in the 'TIL deep dives' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
The way we get from context vectors to next-word prediction turns out to be simpler than I imagined -- but understanding why it works took a bit of thought.| Giles' Blog
Batching speeds up training and inference, but for LLMs we can't just use matrices for it -- we need higher-order tensors.| Giles' Blog
Posts in the 'Musings' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
Why dropout is kind of like the mandatory vacation policies financial firms have| Giles' Blog
Causal, or masked self-attention: when we're considering a token, we don't pay attention to later ones. Following Sebastian Raschka's book 'Build a Large Language Model (from Scratch)'. Part 9/??| Giles' Blog
Archive of Giles Thomas’s blog posts from March 2025. Insights on AI, startups, and software development, plus occasional personal reflections.| www.gilesthomas.com