In this article, we summarize the AI puzzle competition from my blog and answer two questions: which model is better and which prompt engineering hint is giving better results. The answers might surprise you, so give this a read :)| mihai.page
Before concluding the AI 2025 puzzle competition I asked LLMs a simple common sense question to see how they behave. They didn't perform that great.| mihai.page
It's finally here. I analyze QwQ and Deepsek on the 3 math puzzles problem and finish the round of benchmarks I ran in January.| mihai.page
In this article we look at 4 Llama models (via Perplexity) and see how they perform for the 3 puzzles in the competition.| mihai.page
Let’s continue the AI puzzle competition. We have 3 problems and several prompt engineering strategies. We saw how OpenAI models, Google ones, and Claude models have performed and now it is time to look at Mistral models. Read more...| Mihai's page
In this article I read 2.5 million characters output by Claude models to score them on the 3 problems I proposed in the previous articles.| mihai.page
In this article I read nearly 10M characters output by Google models to score them on the 3 problems I proposed in the previous articles.| mihai.page
In this article I read nearly 7.5 million characters output by OpenAI models to score them on the 3 problems I proposed in the previous ones.| mihai.page
Two days ago I introduced the AI puzzle competition, and yesterday we talked about the problems we will use to gauge the performance of the competing LLMs. Before going into presenting the results, I want to talk about the prompt engineering part, since I started all these experiments to see if it really helps or it’s more of a confirmation bias. Read more...| Mihai's page
In the previous post I introduced the scaffolding for running a test on various LLMs where I give them several puzzles and prompt engineering hints to look at what helps them in reaching a solution, if ever. In this post, I’m going to present the problems and the scoring guideline for each problem. Read more...| Mihai's page
At the end of last year I had a significant amount of OpenAI API credits which were set to expire by the end of the month. I bought them when I experimented with creating fuzzing harnesses via LLMs as part of OSSFuzz project, when I helped the project to use OpenAI models and test their performance. Rather than let them expire, I decided to create new puzzles and test the LLMs on them, just like the last time. The main difference is that this time I wanted to use the API instead of the chat i...| Mihai's page
In the age of social media and social media replacements, why do we need blogs? Well, social media is “realtime”, but it is also “write once, forget for eternity”: besides not owning the content posted there, one also cannot retrieve it when needed (be them the author or just people that remembered that useful post and now are searching for it). And, I intend to write more content that I want to be retrieved later than content that is valid only for the moment. End of article? Not so ...| Mihai's page
This is not my first blog. Not even the second. In fact, I had a blog (more or less), for nearly half of my life. Read more...| Mihai's page
Given GDPR, CCPA, and other similar legislation about online privacy, perhaps it is better to also discuss how this blog handles these concerns. The short answer is that there is nothing to worry about. No private data about visitors is being stored here. Read more...| Mihai's page
What is the relationship between power profiles and performance on modern laptops?| mihai.page
In which I continue the reading of VDGF book, for a few chapters more.| mihai.page
In which I finally solve the math problem from a few posts ago.| mihai.page
I am asking Bard and ChatGPT for help in enhancing the capabilities of this blog.| mihai.page
In which I work over the first few chapters of the book by Tristan Needham| mihai.page
In which I talk about exception handling in several programming languages and why I consider that monadic approaches are better.| mihai.page
After the last Advent-of-Code finished, I switched to using NixOS on my personal computer, after playing with it for a while. Now that Nix is 20 years old, it is time to discuss about why NixOS is perfect for me.| mihai.page
Continuing from the previous article, let's look at how we can determine what binary operations are needed to write a short expression to bit twiddle isDigit.| mihai.page
Recently I got nerd-sniped into ways of writing `isDigit` function using bit operations.| mihai.page
This week, I released `hindent-6.0.0` which also comes with a provenance attestation certifying that the release build is done using a secure supply chain. Since this is new for Haskell, I'm explaining what this entails and how it can be used.| mihai.page
A short puzzle is asked of ChatGPT. After long sessions of explaining what it gets wrong and simplifying the questions, in the end the AI manages to provide an answer to our puzzle, but not before saying I am half a million years old.| mihai.page
A short discussion about numbers in Babylon and floating point| mihai.page
Which graph database is faster? Which one is easier to use? What can GUAC use?| mihai.page