Today I was going over a paper by Microsoft Research team on how AI is impacting professsional work. This paper was published in July 2025. They analyzed 200k anonymized and privacy-scrubbed conversations between users and Microsoft Bing Copilot to understand how generative AI impacts different occupations and work activities. They seperated analysis into two distinct … Continue reading "Paper: Working with AI: Measuring the Occupational Implications of Generative AI"| Shekhar Gulati
Google recently released Gemma 3 270M, a remarkably compact 270 million parameter language model that promises efficient AI capabilities in a tiny package. As someone building AI voice agents, I was immediately interested in testing whether this model could handle one of my simplest but frequent use cases: generating message variations for conversational AI. For … Continue reading "I Tested Gemma 3 270M on the Simplest NLP Task"| Shekhar Gulati
Today, I was browsing Hacker News when I stumbled upon an interesting project: coderunner-ui. The premise was compelling – a local-first AI workspace that lets you chat with LLMs and execute …| Shekhar Gulati
I have spent last few months working on a regulatory intelligence software. One of the important feature is extracting obligations from dense PDF documents. In this post I am sharing some of the le…| Shekhar Gulati
Today I was reading OpenAI guide on model selection https://platform.openai.com/docs/guides/model-selection where they explained how to calculate a reaslistic accuracy target for LLM task by evaluating financial impact of model decisions. They gave an example of fake news classifier. This is a good way to find the accuracy you need for the task. Break-even accuracy is … Continue reading "Setting a realistic accuracy target for LLM tasks"| Shekhar Gulati
Cursor, the AI-powered code editor that has transformed how developers write code, recently underwent a significant pricing overhaul that has sparked intense debate in the developer community. The …| Shekhar Gulati
In the last blog I discussed how I use OpenAI Code Interpreter to do RAG over data (CSV, Excel, etc.) files. OpenAI Code Interpreter is a managed offering and it does have some limitations. So, I was looking for an open source alternative. I discovered Pydantic team’s MCP Run Python package. It is an MCP … Continue reading "Using Pydantic MCP Run Python as an Open Source Alternative to OpenAI Code Interpreter"| Shekhar Gulati
While large language models (LLMs) have achieved remarkable capabilities in processing long contexts and locating specific information, a recent paper reveals a surprising blind spot: they struggle…| Shekhar Gulati
When building RAG systems, one common challenge is helping users query their own data. Users often come with a couple of Excel files, Word documents, or CSV files and want to ask questions like …| Shekhar Gulati
One term that I have been hearing a lot lately is reward hacking. I have heard this term multiple times from folks at OpenAI and Anthropic, and it represents a fundamental challenge in AI alignment and reliability. What is Reward Hacking? Reward hacking, also known as specification gaming, occurs when an AI optimizes an objective … Continue reading "Reward Hacking"| Shekhar Gulati
I was listening to a talk by Anthropic folks on Claude Code In the talk speaker was asked why they built Claude code as CLI tool instead of IDE. They gave two reasons: Claude Code is built by Anthr…| Shekhar Gulati
Mistral released a new model yesterday. It is designed to excel at Agentic coding tasks meaning it can use tools. It is Apache 2.0 license. It is finetuned from Mistral-Small-3.1, therefore it has …| Shekhar Gulati
Creating an AI assistant that generate helpful answers from a knowledge base is a complex problem. A significant hurdle is the frequent mismatch between how users ask questions and how information …| Shekhar Gulati
Last night, I found myself overwhelmed by open tabs in Chrome. I wondered how many I had open, but couldn’t find a built-in tab counter. While third-party extensions likely existed, I am not …| Shekhar Gulati
In my previous post we built Prompt Injection Detector by training a LogisticRegression classifier on embeddings of SPML Chatbot Prompt Injection Dataset. Today, we will look at how we can fine-tun…| Shekhar Gulati
In the last couple of days, I’ve spent some hours playing with Patchwork. Patchwork is an open-source framework that leverages AI to accelerate asynchronous development tasks like code review…| Shekhar Gulati
Today I was watching a talk by Maggie Appleton from local-first conference. She points out in her insightful talk on homecooked software and barefoot developers, there exists a significant gap in a…| Shekhar Gulati
I was reading Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost paper today and thought of applying it to a problem I solved a couple of months back. This paper introduced the Con…| Shekhar Gulati
In the ever-evolving landscape of data management, DuckDB has carved out a niche for itself as a powerful analytical database designed for efficient in-process data analysis. It is particularly wel…| Shekhar Gulati
Today I was reading Chapter 9 “Multimodal Large Language Models” of Hands-On Large Language Models book and thought of applying it to a problem I face occassionally. The chapter covers …| Shekhar Gulati
I enjoy reading books on Oreilly learning platform . For the past month, a new feature on the Oreilly platform called “Answers” has been staring me down, and I haven’t been tempte…| Shekhar Gulati
In this post, we will discuss how to build a Prompt Injection detector using a simple classification task with Scikit-learn’s Logistic Regression. Logistic Regression is a statistical method …| Shekhar Gulati
I spent some time going over the Postgres schema of Gitlab. GitLab is an alternative to Github. You can self host GitLab since it is an open source DevOps platform. My motivation to understand the …| Shekhar Gulati