An underestimated challenge in making productive use of LLMs is that it can feel like cheating. One trick I've found that helps is to make sure that I am putting …| Simon Willison’s Weblog
After struggling for years trying to figure out why people think [Cloudflare] Durable Objects are complicated, I'm increasingly convinced that it's just that they sound complicated. Feels like we can solve 90% of it by renaming DurableObject to StatefulWorker? It's just a worker that has state. And because it has state, it also has to have a name, so that you can route to the specific worker that has the state you care about. There may be a sqlite database attached, there may be a container a...| Simon Willison's Weblog
Introducing EmbeddingGemma Brand new open weights (under the slightly janky Gemma license) 308M parameter embedding model from Google:Based on the Gemma 3 architecture, EmbeddingGemma is trained on 100+ languages and is small enough to run on less than 200MB of RAM with quantization. It's available via sentence-transformers, llama.cpp, MLX, Ollama, LMStudio and more. As usual for these smaller models there's a Transformers.js demo (via) that runs directly in the browser (in Chrome variants) -...| Simon Willison's Weblog
Any time I share my collection of tools built using vibe coding and AI-assisted development (now at 124, here's the definitive list) someone will inevitably complain that they're mostly trivial. A lot of them are! Here's a list of some that I think are genuinely useful and worth highlighting: OCR PDFs and images directly in your browser. This is the tool that started the collection, and I still use it on a regular basis. You can open any PDF in it (even PDFs that are just scanned images with ...| Simon Willison's Weblog
Beyond Vibe Coding Back in May I wrote Two publishers and three authors fail to understand what “vibe coding” means where I called out the authors of two forthcoming books on "vibe coding" for abusing that term to refer to all forms of AI-assisted development, when Not all AI-assisted programming is vibe coding based on the original Karpathy definition.I'll be honest: I don't feel great about that post. I made an example of those two books to push my own agenda of encouraging "vibe coding...| Simon Willison's Weblog
gov.uscourts.dcd.223205.1436.0_1.pdf Here's the 230 page PDF ruling on the 2023 United States v. Google LLC federal antitrust case - the case that could have resulted in Google selling off Chrome and cutting most of Mozilla's funding.I made it through the first dozen pages - it's actually quite readable. It opens with a clear summary of the case so far, bold highlights mine: Last year, this court ruled that Defendant Google LLC had violated Section 2 of the Sherman Act: “Google is a monopol...| Simon Willison's Weblog
Making XML human-readable without XSLT In response to the recent discourse about XSLT support in browsers, Jake Archibald shares a new-to-me alternative trick for making an XML document readable in a browser: adding the following element near the top of the XML:<script xmlns="http://www.w3.org/1999/xhtml" src="script.js" defer="" /> That script.js will then be executed by the browser, and can swap out the XML with HTML by creating new elements using the correct namespace: const htmlEl = docum...| Simon Willison's Weblog
Rich Pixels Neat Python library by Darren Burns adding pixel image support to the Rich terminal library, using tricks to render an image using full or half-height colored blocks.Here's the key trick - it renders Unicode ▄ (U+2584, "lower half block") characters after setting a foreground and background color for the two pixels it needs to display. I got GPT-5 to vibe code up a show_image.py terminal command which resizes the provided image to fit the width and height of the current terminal...| Simon Willison's Weblog
I just sent out my August 2025 sponsors-only newsletter summarizing the past month in LLMs and my other work. Topics included GPT-5, gpt-oss, image editing models (Qwen-Image-Edit and Gemini Nano Banana), other significant model releases and the tools I'm using at the moment. If you'd like a preview of the newsletter, here's the July 2025 edition I sent out a month ago. New sponsors get access to the full archive. If you start sponsoring for $10/month or more right now you'll get instant acce...| Simon Willison's Weblog
Introducing gpt-realtime Released a few days ago (August 28th), gpt-realtime is OpenAI's new "most advanced speech-to-speech model". It looks like this is a replacement for the older gpt-4o-realtime-preview model that was released last October.This is a slightly confusing release. The previous realtime model was clearly described as a variant of GPT-4o, sharing the same October 2023 training cut-off date as that model. I had expected that gpt-realtime might be a GPT-5 relative, but its traini...| Simon Willison's Weblog
Cloudflare Radar: AI Insights Cloudflare launched this dashboard back in February, incorporating traffic analysis from Cloudflare's network along with insights from their popular 1.1.1.1 DNS service.I found this chart particularly interesting, showing which documented AI crawlers are most active collecting training data - lead by GPTBot, ClaudeBot and Meta-ExternalAgent: Cloudflare's DNS data also hints at the popularity of different services. ChatGPT holds the first place, which is unsurpris...| Simon Willison's Weblog
Claude Opus 4.1 and Opus 4 degraded quality Notable because often when people complain of degraded model quality it turns out to be unfounded - Anthropic in the past have emphasized that they don't change the model weights after releasing them without changing the version number.In this case a botched upgrade of their inference stack cause a genuine model degradation for 56.5 hours: From 17:30 UTC on Aug 25th to 02:00 UTC on Aug 28th, Claude Opus 4.1 experienced a degradation in quality for s...| Simon Willison's Weblog
LLMs are intelligence without agency—what we might call "vox sine persona": voice without person. Not the voice of someone, not even the collective voice of many someones, but a voice emanating from no one at all. — Benj Edwards Tags: benj-edwards, ai-personality, generative-ai, ai, llms| Simon Willison's Weblog
Talk Python: Celebrating Django's 20th Birthday With Its Creators I recorded this podcast episode recently to celebrate Django's 20th birthday with Adrian Holovaty, Will Vincent, Jeff Triplet, and Thibaud Colas.We didn’t know that it was a web framework. We thought it was a tool for building local newspaper websites. [...] Django’s original tagline was ‘Web development on journalism deadlines’. That’s always been my favorite description of the project. Tags: adrian-holovaty, django,...| Simon Willison's Weblog
The perils of vibe coding I was interviewed by Elaine Moore for this opinion piece in the Financial Times, which ended up in the print edition of the paper too! I picked up a copy yesterday: From the article, with links added by me to relevant projects: Willison thinks the best way to see what a new model can do is to ask for something unusual. He likes to request an SVG (an image made out of lines described with code) of a pelican on a bike and asks it to remember the chickens in his garden ...| Simon Willison's Weblog
Python: The Documentary New documentary about the origins of the Python programming language - 84 minutes long, built around extensive interviews with Guido van Rossum and others who were there at the start and during the subsequent journey. Tags: computer-history, guido-van-rossum, python, youtube| Simon Willison's Weblog
We were back in London for a few days and yesterday had a day of culture. First up: the brand new V&A East Storehouse museum in the Queen Elizabeth Olympic Park near Stratford, which opened on May 31st this year. This is a delightful new format for a museum. The building is primarily an off-site storage area for London's Victoria and Albert museum, storing 250,000 items that aren't on display in their main building. The twist is that it's also open to the public. Entrance is free, and you can...| Simon Willison's Weblog
We simply don’t know to defend against these attacks. We have zero agentic AI systems that are secure against these attacks. Any AI that is working in an adversarial environment—and …| Simon Willison’s Weblog
Since I love collecting questionable analogies for LLMs, here's a new one I just came up with: an LLM is a lossy encyclopedia. They have a huge array of facts …| Simon Willison’s Weblog
Since its first release, the single biggest question around the uv Python environment management tool has been around Astral's business model: Astral are a VC-backed company and at some point …| Simon Willison’s Weblog
Piloting Claude for Chrome Two days ago I said:I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely. Today Anthropic announced their own take on this pattern, implemented as an invite-only preview Chrome extension. To their credit, the majority of the blog post and accompanying support article is information about the security risks. From their post: Just as people encounter phishing attempts in their inboxes, browser-using AIs...| Simon Willison's Weblog
Will Smith’s concert crowds are real, but AI is blurring the lines Great piece from Andy Baio demonstrating quite how convoluted the usage ethics and backlash against generative AI has become.Will Smith has been accused of using AI to misleadingly inflate the audience sizes of his recent tour. It looks like the audiences were real, but the combined usage of static-image-to-video models by his team with YouTube's ugly new compression experiments gave the resulting footage an uncanny valley e...| Simon Willison's Weblog
Agentic Browser Security: Indirect Prompt Injection in Perplexity Comet The security team from Brave took a look at Comet, the LLM-powered "agentic browser" extension from Perplexity, and unsurprisingly found security holes you can drive a truck through.The vulnerability we’re discussing in this post lies in how Comet processes webpage content: when users ask it to “Summarize this webpage,” Comet feeds a part of the webpage directly to its LLM without distinguishing between the user’s...| Simon Willison's Weblog
Static Sites with Python, uv, Caddy, and Docker Nik Kantar documents his Docker-based setup for building and deploying mostly static web sites in line-by-line detail.I found this really useful. The Dockerfile itself without comments is just 8 lines long: FROM ghcr.io/astral-sh/uv:debian AS build WORKDIR /src COPY . . RUN uv python install 3.13 RUN uv run --no-dev sus FROM caddy:alpine COPY Caddyfile /etc/caddy/Caddyfile COPY --from=build /src/output /srv/ He also includes a Caddyfile that sho...| Simon Willison's Weblog
Spatial Joins in DuckDB Extremely detailed overview by Max Gabrielsson of DuckDB's new spatial join optimizations.Consider the following query, which counts the number of NYC Citi Bike Trips for each of the neighborhoods defined by the NYC Neighborhood Tabulation Areas polygons and returns the top three: SELECT neighborhood, count(*) AS num_rides FROM rides JOIN hoods ON ST_Intersects( rides.start_geom, hoods.geom ) GROUP BY neighborhood ORDER BY num_rides DESCLIMIT3; The rides table contains...| Simon Willison's Weblog
ChatGPT release notes: Project-only memory The feature I've most wanted from ChatGPT's memory feature (the newer version of memory that automatically includes relevant details from summarized prior conversations) just landed:With project-only memory enabled, ChatGPT can use other conversations in that project for additional context, and won’t use your saved memories from outside the project to shape responses. Additionally, it won’t carry anything from the project into future chats ou...| Simon Willison's Weblog
DeepSeek 3.1 The latest model from DeepSeek, a 685B monster (like DeepSeek v3 before it) but this time it's a hybrid reasoning model.DeepSeek claim: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly. Drew Breunig points out that their benchmarks show "the same scores with 25-50% fewer tokens" - at least across AIME 2025 and GPQA Diamond and LiveCodeBench. The DeepSeek release includes prompt examples for a coding agent, a python agent an...| Simon Willison's Weblog
Mississippi's approach would fundamentally change how users access Bluesky. The Supreme Court’s recent decision leaves us facing a hard reality: comply with Mississippi’s age assurance law—and make every Mississippi Bluesky user hand over sensitive personal information and undergo age checks to access the site—or risk massive fines. The law would also require us to identify and track which users are children, unlike our approach in other regions. [...] We believe effective child ...| Simon Willison's Weblog
too many model context protocol servers and LLM allocations on the dance floor Useful reminder from Geoffrey Huntley of the infrequently discussed significant token cost of using MCP.Geoffrey estimate estimates that the usable context window something like Amp or Cursor is around 176,000 tokens - Claude 4's 200,000 minus around 24,000 for the system prompt for those tools. Adding just the popular GitHub MCP defines 93 additional tools and swallows another 55,000 of those valuable tokens! MCP ...| Simon Willison's Weblog
Most classical engineering fields deal with probabilistic system components all of the time. In fact I'd go as far as to say that inability to deal with probabilistic components is disqualifying from many engineering endeavors. Process engineers for example have to account for human error rates. On a given production line with humans in a loop, the operators will sometimes screw up. Designing systems to detect these errors (which are highly probabilistic!), mitigate them, and reduce the occur...| Simon Willison's Weblog
I was at a leadership group and people were telling me "We think that with AI we can replace all of our junior people in our company." I was like, "That's the dumbest thing I've ever heard. They're probably the least expensive employees you have, they're the most leaned into your AI tools, and how's that going to work when you go 10 years in the future and you have no one that has built up or learned anything? — Matt Garman, CEO, Amazon Web Services Tags: ai-ethics, careers, generative-ai, ...| Simon Willison's Weblog
Simply put, my central worry is that many people will start to believe in the illusion of AIs as conscious entities so strongly that they’ll soon advocate for AI rights, model welfare and even AI citizenship. This development will be a dangerous turn in AI progress and deserves our immediate attention. We must build AI for people; not to be a digital person. [...] we should build AI that only ever presents itself as an AI, that maximizes utility while minimizing markers of consciousness. Ra...| Simon Willison's Weblog
what’s the point of vibe coding if at the end of the day i still gotta pay a dev to look at the code anyway. sure it feels kinda cool …| Simon Willison’s Weblog
36 posts tagged ‘exfiltration-attacks’. Exfiltration attacks are prompt injection attacks against chatbots that have access to private information, where that information is exfiltrated by the attack…| Simon Willison’s Weblog
Remote Prompt Injection in GitLab Duo Leads to Source Code Theft.| Simon Willison’s Weblog
System Card: Claude Opus 4 & Claude Sonnet 4.| Simon Willison’s Weblog
Expanding on what we missed with sycophancy.| Simon Willison’s Weblog
Series: LLMs on personal devices| Simon Willison’s Weblog
12 posts tagged ‘ai-energy-usage’. How much energy is used by AI systems?| Simon Willison’s Weblog
I’m beginning to suspect that one of the most common misconceptions about LLMs such as ChatGPT involves how “training” works. A common complaint I see about these tools is that …| Simon Willison’s Weblog
ChatGPT now has "memory", and it's implemented in a delightfully simple way. You can instruct it to remember specific things about you and it will then have access to that …| Simon Willison’s Weblog
Earlier this month I wrote about how ChatGPT can’t access the internet, even though it really looks like it can. Consider this part two in the series. Here’s another common …| Simon Willison’s Weblog
Filippo Valsorda founded Geomys last year as an "organization of professional open source maintainers", providing maintenance and support for critical packages in the Go language ecosystem backed by clients in …| Simon Willison’s Weblog
Fun, creative new micro-eval. Split the world into a sampled collection of latitude longitude points and for each one ask a model: If this location is over land, say 'Land'. …| Simon Willison’s Weblog
I shipped LLM 0.27 today (followed by a 0.27.1 with minor bug fixes), adding support for the new GPT-5 family of models from OpenAI plus a flurry of improvements to …| Simon Willison’s Weblog
the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from <1% to 7%, and for plus users from 7% to …| Simon Willison’s Weblog
The issue with GPT-5 in a nutshell is that unless you pay for model switching & know to use GPT-5 Thinking or Pro, when you ask “GPT-5” you sometimes get …| Simon Willison’s Weblog
December 2023| Simon Willison’s Weblog
I gave a talk on Wednesday at the Bay Area AI Security Meetup about prompt injection, the lethal trifecta and the challenges of securing systems that use MCP. It wasn’t …| Simon Willison’s Weblog
August 2025| Simon Willison’s Weblog
I’ve been dipping into the r/ChatGPT subreddit recently to see how people are reacting to the GPT-5 launch, and so far the vibes there are not good. This AMA thread …| Simon Willison’s Weblog
I’ve had preview access to the new GPT-5 model family for the past two weeks (see related video and my disclosures) and have been using GPT-5 as my daily-driver. It’s …| Simon Willison’s Weblog
This week, ChatGPT is on track to reach 700M weekly active users — up from 500M at the end of March and 4× since last year.| Simon Willison’s Weblog
When you vibe code, you are incurring tech debt as fast as the LLM can spit it out. Which is why vibe coding is perfect for prototypes and throwaway projects: …| Simon Willison’s Weblog
Recent| Simon Willison’s Weblog
Sycophancy in GPT-4o: What happened and what we’re doing about it| Simon Willison’s Weblog
Dropping a model release as significant as Llama 4 on a weekend is plain unfair! So far the best place to learn about the new model family is this post …| Simon Willison’s Weblog
I gave the opening keynote at the AI Engineer World’s Fair yesterday. I was a late addition to the schedule: OpenAI pulled out of their slot at the last minute, …| Simon Willison’s Weblog
October 2020| Simon Willison’s Weblog
I wrote about the new GLM-4.5 model family yesterday—new open weight (MIT licensed) models from Z.ai in China which their benchmarks claim score highly in coding even against models such …| Simon Willison’s Weblog
For the past two and a half years the feature I’ve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the …| Simon Willison’s Weblog
31 posts tagged ‘vibe-coding’. As defined here - not the same thing as AI-assisted programming, though there's some overlap.| Simon Willison’s Weblog
In further evidence that phishing attacks can catch out the most sophisticated among us, security researcher (and operator of ';--have i been pwned?) Troy Hunt reports on how he fell …| Simon Willison’s Weblog
April 2025| Simon Willison’s Weblog
If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find out Elon Musk’s stance before providing you with an answer. …| Simon Willison’s Weblog
Released last night, Grok 4 is now available via both API and a paid subscription for end-users. Update: If you ask it about controversial topics it will sometimes search X …| Simon Willison’s Weblog
Max Woolf pointed out this new feature of the Gemini 2.5 series (here’s my coverage of 2.5 Pro and 2.5 Flash) in a comment on Hacker News: One hidden note …| Simon Willison’s Weblog
Quitting programming as a career right now because of LLMs would be like quitting carpentry as a career thanks to the invention of the table saw.| Simon Willison’s Weblog
Supabase MCP can leak your entire SQL database| Simon Willison’s Weblog
July 2025| Simon Willison’s Weblog
Here's yet another example of a lethal trifecta attack, where an LLM system combines access to private data, exposure to potentially malicious instructions and a mechanism to communicate data back …| Simon Willison’s Weblog
Alex Gaynor maintains rust-asn1, and recently spotted a missing LLVM compiler optimization while hacking on it, with the assistance of Claude (Alex works for Anthropic). He describes how he confirmed …| Simon Willison’s Weblog
March 2024| Simon Willison’s Weblog
If you are a user of LLM systems that use tools (you can call them “AI agents” if you like) it is critically important that you understand the risk of …| Simon Willison’s Weblog
I presented a three hour workshop at PyCon US yesterday titled Building software on top of Large Language Models. The goal of the workshop was to give participants everything they …| Simon Willison’s Weblog
102 posts tagged ‘prompt-injection’. Prompt Injection is a security attack against applications built on top of Large Language Models, introduced here and further described in this series of posts.| Simon Willison’s Weblog
This new paper by 11 authors from organizations including IBM, Invariant Labs, ETH Zurich, Google and Microsoft is an excellent addition to the literature on prompt injection and LLM security. …| Simon Willison’s Weblog
shot-scraper is a new tool that I’ve built to help automate the process of keeping screenshots up-to-date in my documentation. It also doubles as a scraping tool—hence the name—which I …| Simon Willison’s Weblog
March 2022| Simon Willison’s Weblog
June 2025| Simon Willison’s Weblog
I presented an invited keynote at the AI Engineer World’s Fair in San Francisco this week. This is my third time speaking at the event—here are my talks from October …| Simon Willison’s Weblog
February 2012| Simon Willison’s Weblog
You’re starting with an invalid assumption. Front end development is absolutely not “easier” than other forms of engineering. When you’re writing server-side code, you’re writing for one language on one …| Simon Willison’s Weblog
Solomon Hykes just presented the best definition of an AI agent I've seen yet, on stage at the AI Engineer World's Fair: An AI agent is an LLM wrecking its …| Simon Willison’s Weblog
A fun new benchmark just dropped! Inspired by the Claude 4 system card—which showed that Claude 4 might just rat you out to the authorities if you told it to …| Simon Willison’s Weblog
LLM 0.26 is out with the biggest new feature since I started the project: support for tools. You can now use the LLM CLI tool—and Python library—to grant LLMs from …| Simon Willison’s Weblog
Big upgrade to Mistral's API this morning: they've announced a new "Agents API". Mistral have been using the term "agents" for a while now. Here's how they describe them: AI …| Simon Willison’s Weblog
I was going slightly spare at the fact that every talk at this Anthropic developer conference has used the word "agents" dozens of times, but nobody ever stopped to provide …| Simon Willison’s Weblog
Classic slop: it listed real authors with entirely fake books. There's an important follow-up from 404 Media in their subsequent story: Victor Lim, the vice president of marketing and communications …| Simon Willison’s Weblog
Yet another example of the classic Markdown image exfiltration attack, this time affecting GitLab Duo - GitLab's chatbot. Omer Mayraz reports on how they found and disclosed the issue. The …| Simon Willison’s Weblog
As more people start hacking around with implementations of MCP (the Model Context Protocol, a new standard for making tools available to LLM-powered systems) the security implications of tools built …| Simon Willison’s Weblog
GitHub's official MCP server grants LLMs a whole host of new abilities, including being able to read and issues in repositories the user has access to and submit new pull …| Simon Willison’s Weblog
Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude …| Simon Willison’s Weblog
GitHub issues is almost the best notebook in the world. Free and unlimited, for both public and private notes. Comprehensive Markdown support, including syntax highlighting for almost any language. Plus …| Simon Willison’s Weblog
Direct link to a PDF on Anthropic's CDN because they don't appear to have a landing page anywhere for this document. Anthropic's system cards are always worth a look, and …| Simon Willison’s Weblog
Last month ChatGPT got a major upgrade. As far as I can tell the closest to an official announcement was this tweet from @OpenAI: Starting today [April 10th 2025], memory …| Simon Willison’s Weblog
GPT-4o's recent update caused it to be way too sycophantic and disingenuously praise anything the user said. OpenAI's Aidan McLaughlin: last night we rolled out our first fix to remedy …| Simon Willison’s Weblog
Relatively thin post from OpenAI talking about their recent rollback of the GPT-4o model that made the model way too sycophantic - "overly flattering or agreeable", to use OpenAIs own …| Simon Willison’s Weblog
The Chatbot Arena has become the go-to place for vibes-based evaluation of LLMs over the past two years. The project, originating at UC Berkeley, is home to a large community …| Simon Willison’s Weblog
Watching OpenAI’s new o3 model guess where a photo was taken is one of those moments where decades of science fiction suddenly come to life. It’s a cross between the …| Simon Willison’s Weblog