What actually goes on inside an LLM to make it calculate probabilities for the next token?| Giles' Blog
Posts in the 'TIL deep dives' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.| www.gilesthomas.com
The way we get from context vectors to next-word prediction turns out to be simpler than I imagined -- but understanding why it works took a bit of thought.| Giles' Blog
Finally getting to the end of chapter 3 of Raschka’s LLM book! This time it’s multi-head attention: what it is, how it works, and why the code does what it does.| Giles' Blog