Two days ago I introduced the AI puzzle competition, and yesterday we talked about the problems we will use to gauge the performance of the competing LLMs. Before going into presenting the results, I want to talk about the prompt engineering part, since I started all these experiments to see if it really helps or it’s more of a confirmation bias. Read more...| Mihai's page
In the previous post I introduced the scaffolding for running a test on various LLMs where I give them several puzzles and prompt engineering hints to look at what helps them in reaching a solution, if ever. In this post, I’m going to present the problems and the scoring guideline for each problem. Read more...| Mihai's page
At the end of last year I had a significant amount of OpenAI API credits which were set to expire by the end of the month. I bought them when I experimented with creating fuzzing harnesses via LLMs as part of OSSFuzz project, when I helped the project to use OpenAI models and test their performance. Rather than let them expire, I decided to create new puzzles and test the LLMs on them, just like the last time. The main difference is that this time I wanted to use the API instead of the chat i...| Mihai's page
In which I finally solve the math problem from a few posts ago.| mihai.page