I’m freezing this blog and starting to post on my Substack instead. The authoring experience is much more convenient for me there. Please follow me there, and check out The Illustrated DeepSeek R-1 if you haven’t yet. And check out our How Transformer LLMs Work course!| Jay Alammar
Here are eight observations I’ve shared recently on the Cohere blog and videos that go over them.: Article: What’s the big deal with Generative AI? Is it the future or the present? Article: AI is Eating The World| Jay Alammar
Can AI Image generation tools make re-imagined, higher-resolution versions of old video game graphics? Over the last few days, I used AI image generation to reproduce one of my childhood nightmares. I wrestled with Stable Diffusion, Dall-E and Midjourney to see how these commercial AI generation tools can help retell an old visual story - the intro cinematic to an old video game (Nemesis 2 on the MSX). This post describes the process and my experience in using these models/services to retell ...| Jay Alammar
Translations: Chinese, Vietnamese. (V2 Nov 2022: Updated images for more precise description of forward diffusion. A few more images in this version) AI image generation is the most recent AI capability blowing people’s minds (mine included). The ability to create striking visuals from text descriptions has a magical quality to it and points clearly to a shift in how humans create art. The release of Stable Diffusion is a clear milestone in this development because it made a high-performanc...| Jay Alammar
A little less than a year ago, I joined the awesome Cohere team. The company trains massive language models (both GPT-like and BERT-like) and offers them as an API (which also supports finetuning). Its founders include Google Brain alums including co-authors of the original Transformers paper. It’s a fascinating role where I get to help companies and developers put these massive models to work solving real-world problems. I love that I get to share some of the intuitions developers need to ...| Jay Alammar
Discussion: Discussion Thread for comments, corrections, or any feedback. Translations: Korean, Russian Summary: The latest batch of language models can be much smaller yet achieve GPT-3 like performance by being able to query a database or search the web for information. A key indication is that building larger and larger models is not the only way to improve performance. Video The last few years saw the rise of Large Language Models (LLMs) – machine learning models that rapidly improve ho...| Jay Alammar
Introducing the Explainable AI Cheat Sheet, your high-level guide to the set of tools and methods that helps humans understand AI/ML models and their predictions. I introduce the cheat sheet in this brief video:| Jay Alammar
By visualizing the hidden state between a model's layers, we can get some clues as to the model's "thought process". Figure: Finding the words to say After a language model generates a sentence, we can visualize a view of how the model came by each word (column). Each row is a model layer. The value and color indicate the ranking of the output token at that layer. The darker the color, the higher the ranking. Layer 0 is at the top. Layer 47 is at the bottom. Model:GPT2-XL Part 2: Continuing t...| Jay Alammar
Interfaces for exploring transformer language models by looking at input saliency and neuron activation. Explorable #1: Input saliency of a list of countries generated by a language model Tap or hover over the output tokens: Explorable #2: Neuron activation analysis reveals four groups of neurons, each is associated with generating a certain type of token Tap or hover over the sparklines on the left to isolate a certain factor: The Transformer architecture has been powering a number of the re...| Jay Alammar
Discussions: Hacker News (397 points, 97 comments), Reddit r/MachineLearning (247 points, 27 comments) Translations: German, Korean, Chinese (Simplified), Russian, Turkish The tech world is abuzz with GPT3 hype. Massive language models (like GPT3) are starting to surprise us with their abilities. While not yet completely reliable for most businesses to put in front of their customers, these models are showing sparks of cleverness that are sure to accelerate the march of automation and the pos...| jalammar.github.io
Discussions: Hacker News (347 points, 37 comments), Reddit r/MachineLearning (151 points, 19 comments) Translations: Chinese (Simplified), French, Korean, Portuguese, Russian “There is in all things a pattern that is part of our universe. It has symmetry, elegance, and grace - those qualities you find always in that which the true artist captures. You can find it in the turning of the seasons, in the way sand trails along a ridge, in the branch clusters of the creosote bush or the pattern o...| jalammar.github.io
Discussions: Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments) Translations: Chinese (Simplified), French 1, French 2, Japanese, Korean, Persian, Russian, Spanish 2021 Update: I created this brief and highly accessible video intro to BERT The year 2018 has been an inflection point for machine learning models handling text (or more accurately, Natural Language Processing or NLP for short). Our conceptual understanding of how best to represent words and se...| jalammar.github.io
Discussions: Hacker News (64 points, 3 comments), Reddit r/MachineLearning (219 points, 18 comments) Translations: Simplified Chinese, French, Korean, Russian, Turkish This year, we saw a dazzling application of machine learning. The OpenAI GPT-2 exhibited impressive ability of writing coherent and passionate essays that exceed what we anticipated current language models are able to produce. The GPT-2 wasn’t a particularly novel architecture – it’s architecture is very similar to the de...| jalammar.github.io
Discussions: Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments) Translations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Italian, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese Watch: MIT’s Deep Learning State of the Art lecture referencing this post Featured in courses at Stanford, Harvard, MIT, Princeton, CMU and others Update: This post has now become a book! Check out LLM-book.com which contains (C...| jalammar.github.io
Translations: Chinese, Vietnamese. (V2 Nov 2022: Updated images for more precise description of forward diffusion. A few more images in this version) AI image generation is the most recent AI capability blowing people’s minds (mine included). The ability to create striking visuals from text descriptions has a magical quality to it and points clearly to a shift in how humans create art. The release of Stable Diffusion is a clear milestone in this development because it made a high-performanc...| jalammar.github.io