I come to the conclusion that DeepSeek-R1 is worse than a 5 years-old version of GPT-2 in chess… The very recent, state-of-art, open-weights model DeepSeek R1 is breaking the 2025 news, excellent in many benchmarks, with a new integrated, end-to-end, reinforcement learning approach to large language model (LLM) training. I am personally very excited about this model, and I’ve been working on it in the last few days, confirming that DeepSeek R1 is on-par with GPT-o for several tasks. Yet, ...| blog.mathieuacher.com
I find a way to systematically force a win in 4 moves against the last release of OpenAI (ChatGPT-4o) and in 7 moves against the best GPT at chess gpt-3.5-turbo-instruct, estimated at 1750 Elo. Clickbait: you don’t need to master chess to beat GPTs, you can well be 700 Elo, just follow the moves ;-) The story behind the discovery is interesting and worth sharing. I also discuss the generalization of the results and the implications for the future of GPTs and chess. Don’t fear the clickbai...| blog.mathieuacher.com
Can GPTs like ChatGPT-4 play legal moves and finish chess games? What is the actual Elo rating of GPTs? There have been some hypes, (subjective) assessment, and buzz lately from “GPT is capable of beating 99% of players?” to “GPT plays lots of illegal moves” to “here is a magic prompt with Magnus Carlsen in the headers”. There are more or less solid anecdotes here and there, with counter-examples showing impressive failures or magnified stories on how GPTs can play chess well. I...| blog.mathieuacher.com