For promising Gen Z students, a career as a software developer seemed like the golden ticket to career stability and success. But in the age of AI, the career promise for Gen Z software developers is gone.| Stack Overflow Blog
I have a Python HTTP client built using the httpx library. The client worked fine when I was working with the test environment, but as soon as I pointed it to the prod environment, the request started failing with an HTTP 403 error. To be 100% sure, I tested the same request using the Bruno … Continue reading "Debugging failed HTTP request with Claude Code"| Shekhar Gulati
I was reading a paper by the Google DeepMind team on how they trained Gemini Embedding, a state-of-the-art, unified embedding model. This is the second paper I’ve read this month on training embedding models. Last week, I read about how the Jina embedding model was trained. The Jina embedding paper was thin and lacked details, … Continue reading "Notes from Gemini Embedding Paper"| Shekhar Gulati
I was working on a problem where I needed to extract information from hotel tariff sheet PDF documents. These documents provide details on seasonal room rates, occupancy terms, and related suppleme…| Shekhar Gulati
I’ve noticed something interesting over the past few weeks: I’ve started using the term “agent” in conversations where I don’t feel the need to then define it, roll my eyes …| Simon Willison’s Weblog
More from Mike Caulfield (see also the SIFT method). He starts with a fantastic example of Google's AI mode usually correctly handling a common piece of misinformation but occasionally falling …| Simon Willison’s Weblog
Kylan Gibbs, CEO of Inworld, joins the show to discuss the technical challenges of creating interactive AI for virtual worlds and games, the significance of user experience, and the importance of accessibility and cost-efficiency in deploying AI models.| Stack Overflow Blog
I was going over the code base of mini-swe-agent today. The core agent loop is 100 lines long. All agentic framework does something similar. Interesting facts about mini-swe-agent: The Mini-SWE-Agent operates in a continuous loop, iteratively solving problems by querying an LLM for actions, executing bash commands, and observing results until the task is complete. … Continue reading "Notes on mini-swe-agent"| Shekhar Gulati
If I asked you to guess the job title of someone coding an app for work, your first guess probably wouldn’t be “writer”. It probably wouldn’t be your second or fifth guess either.| stackoverflow.blog
Today I was going over a paper by Microsoft Research team on how AI is impacting professsional work. This paper was published in July 2025. They analyzed 200k anonymized and privacy-scrubbed conversations between users and Microsoft Bing Copilot to understand how generative AI impacts different occupations and work activities. They seperated analysis into two distinct … Continue reading "Paper: Working with AI: Measuring the Occupational Implications of Generative AI"| Shekhar Gulati
Google recently released Gemma 3 270M, a remarkably compact 270 million parameter language model that promises efficient AI capabilities in a tiny package. As someone building AI voice agents, I was immediately interested in testing whether this model could handle one of my simplest but frequent use cases: generating message variations for conversational AI. For … Continue reading "I Tested Gemma 3 270M on the Simplest NLP Task"| Shekhar Gulati
Today, I was browsing Hacker News when I stumbled upon an interesting project: coderunner-ui. The premise was compelling – a local-first AI workspace that lets you chat with LLMs and execute …| Shekhar Gulati
I shipped LLM 0.27 today (followed by a 0.27.1 with minor bug fixes), adding support for the new GPT-5 family of models from OpenAI plus a flurry of improvements to …| Simon Willison’s Weblog
I gave a talk on Wednesday at the Bay Area AI Security Meetup about prompt injection, the lethal trifecta and the challenges of securing systems that use MCP. It wasn’t …| Simon Willison’s Weblog
I’ve been dipping into the r/ChatGPT subreddit recently to see how people are reacting to the GPT-5 launch, and so far the vibes there are not good. This AMA thread …| Simon Willison’s Weblog
I’ve had preview access to the new GPT-5 model family for the past two weeks (see related video and my disclosures) and have been using GPT-5 as my daily-driver. It’s …| Simon Willison’s Weblog
In this episode of Leaders of Code, Jody Bailey, Stack Overflow’s CTPO, Anirudh Kaul, Senior Director of Software Engineering, and Paul Petersen, Cloud Platform Engineering Manager, discuss the U.S. Bank’s journey from traditional banking practices to embracing new technologies.| Stack Overflow Blog
Ryan welcomes Mahir Yavuz, Senior Director of Engineering at Etsy, to the show to explore the unique challenges that Etsy’s marketplace faces and how Etsy’s teams leverage machine learning and AI to manage product SKUs, enrich inventory metadata, and improve both buyer and seller experiences.| Stack Overflow Blog
I have spent last few months working on a regulatory intelligence software. One of the important feature is extracting obligations from dense PDF documents. In this post I am sharing some of the le…| Shekhar Gulati
Today I was reading OpenAI guide on model selection https://platform.openai.com/docs/guides/model-selection where they explained how to calculate a reaslistic accuracy target for LLM task by evaluating financial impact of model decisions. They gave an example of fake news classifier. This is a good way to find the accuracy you need for the task. Break-even accuracy is … Continue reading "Setting a realistic accuracy target for LLM tasks"| Shekhar Gulati
Cursor, the AI-powered code editor that has transformed how developers write code, recently underwent a significant pricing overhaul that has sparked intense debate in the developer community. The …| Shekhar Gulati
Solomon Hykes just presented the best definition of an AI agent I've seen yet, on stage at the AI Engineer World's Fair: An AI agent is an LLM wrecking its …| Simon Willison’s Weblog
Hey there! It's good to be back on the blog. Over the past few months, I've been focused on setting up the foundations for A New Social. I couldn't have imagined this is where I'd end up after writing my Bridges & The Last Network Effect post, but here we are!| augment
One term that I have been hearing a lot lately is reward hacking. I have heard this term multiple times from folks at OpenAI and Anthropic, and it represents a fundamental challenge in AI alignment…| Shekhar Gulati
Mistral released a new model yesterday. It is designed to excel at Agentic coding tasks meaning it can use tools. It is Apache 2.0 license. It is finetuned from Mistral-Small-3.1, therefore it has …| Shekhar Gulati
Big upgrade to Mistral's API this morning: they've announced a new "Agents API". Mistral have been using the term "agents" for a while now. Here's how they describe them: AI …| Simon Willison’s Weblog
I was going slightly spare at the fact that every talk at this Anthropic developer conference has used the word "agents" dozens of times, but nobody ever stopped to provide …| Simon Willison’s Weblog
Classic slop: it listed real authors with entirely fake books. There's an important follow-up from 404 Media in their subsequent story: Victor Lim, the vice president of marketing and communications …| Simon Willison’s Weblog
Relatively thin post from OpenAI talking about their recent rollback of the GPT-4o model that made the model way too sycophantic - "overly flattering or agreeable", to use OpenAIs own …| Simon Willison’s Weblog
A practical guide for trial lawyers who want to try out AI LLMs (ChatGPT-4) in their practice and including simple-to-follow instructions and prompt examples.| Ball in your Court
Last night, I found myself overwhelmed by open tabs in Chrome. I wondered how many I had open, but couldn’t find a built-in tab counter. While third-party extensions likely existed, I am not …| Shekhar Gulati
In my previous post we built Prompt Injection Detector by training a LogisticRegression classifier on embeddings of SPML Chatbot Prompt Injection Dataset. Today, we will look at how we can fine-tun…| Shekhar Gulati
In the last couple of days, I’ve spent some hours playing with Patchwork. Patchwork is an open-source framework that leverages AI to accelerate asynchronous development tasks like code review…| Shekhar Gulati
Today I was watching a talk by Maggie Appleton from local-first conference. She points out in her insightful talk on homecooked software and barefoot developers, there exists a significant gap in a…| Shekhar Gulati
I was reading Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost paper today and thought of applying it to a problem I solved a couple of months back. This paper introduced the Con…| Shekhar Gulati
Today I was reading Chapter 9 “Multimodal Large Language Models” of Hands-On Large Language Models book and thought of applying it to a problem I face occassionally. The chapter covers …| Shekhar Gulati
I enjoy reading books on Oreilly learning platform . For the past month, a new feature on the Oreilly platform called “Answers” has been staring me down, and I haven’t been tempte…| Shekhar Gulati
Today, I want to expand on a topic I discussed in issue #2: publishers striking deals with AI companies and what that means for their futures and the publisher landscape as a whole.| augment
"My goal for the next issue is to not talk about the Fediverse." That was me in the last issue of Human-Generated Content and I would like to start by apologizing for this very predictable lie. Hello, again! Last time, we talked about the diverging strategies between publishers choosing AI| augment
Publishers are seeing two very different futures for their businesses. Is the future of media aggregated and summarized or is it direct-to-audience?| augment
This post is my thought after working in the GenAI startup space a bit and observing many peers in the space. Building a successful startup (and its products) is always hard, but I feel that building in GenAI space with a small team and budget may be actually harder than the average, in contrast to […]| piaoyang
Swing dancing and prompt engineering are pretty different. But could learning one help us learn the other?| alexwlchan.net
Reader’s Digest, the century-old magazine with the highest paid circulation, has long published “condensed” books; anthologies of four-to-five popular novels abridged to fit in a single volume.&nbs…| Ball in your Court