GPT-5 and GPT-5 Thinking are large language models recently realeased by OpenAI, after a long series of annoucements and hype. Results on benchmarks are impressive. How good these reasoning models are in chess? Using a simple four-move sequence, I suceed to force GPT-5 and GPT-5 Thinking into an illegal move. Basically as GPT3.5, GPT4, DeepSeek-R1, o4-mini, o3 (see all my posts). There are other concerning insights… Though it is a very specific example, it is not a good sign.| Mathieu Acher
o3 and o4-mini are large language models recently realeased by OpenAI and augmented with chain-of-thought reinforcement learning, designed to “think before they speak” by generating explicit, multi-step reasoning before producing an answer. How good these reasoning models are in chess? Using a simple four-move sequence, I suceed to force o3 into an illegal move, and across multiple matches both o3 and o4-mini struggle dramatically, by generating illegal moves in over 90% of cases and even...| blog.mathieuacher.com
AI models are often overconfident. A new MIT training method teaches them self-doubt, improving reliability and making them more trustworthy. The post A new way to train AI models to know when they don’t know first appeared on TechTalks.| TechTalks
Surgical robots perform millions of delicate operations annually under human control. Now they’re getting ready to operate on their own.| An AI System Controlled DaVinci Surgical Robots
One of the goals of AI research is to teach machines how to do the same things people do, but better. In the early 2000s, this meant focusing on problems like flying helicopters [https://www.youtube.com/watch?v=M-QUkgk3HyE] and walking up flights of stairs [https://www.youtube.com/| The Gradient
I come to the conclusion that DeepSeek-R1 is worse than a 5 years-old version of GPT-2 in chess… The very recent, state-of-art, open-weights model DeepSeek R1 is breaking the 2025 news, excellent in many benchmarks, with a new integrated, end-to-end, reinforcement learning approach to large language model (LLM) training. I am personally very excited about this model, and I’ve been working on it in the last few days, confirming that DeepSeek R1 is on-par with GPT-o for several tasks. Yet, ...| blog.mathieuacher.com
Is Attention all you need? Mamba, a novel AI model based on State Space Models (SSMs), emerges as a formidable alternative to the widely used Transformer models, addressing their inefficiency in processing long sequences.| The Gradient
The "Era of Experience" envisions AI's evolution beyond human data, emphasizing self-learning from real-world interactions. But challenges loom for this vision. The post Are we at the cusp of a new era for artificial intelligence? first appeared on TechTalks.| TechTalks
This is a 3 part series of Deep Q-Learning, which is written such that undergrads with highschool maths should be able to understand and hit the ground running on their deep learning projects. This…| Bruceoutdoors Blog of Blots
This is a 3 part series of Deep Q-Learning, which is written such that undergrads with highschool maths should be able to understand and hit the ground running on their deep learning projects. This…| Bruceoutdoors Blog of Blots
As the 2010’s draw to a close, it’s worth taking a look back at the monumental progress that has been made in Deep Learning in this decade.[1] Driven by the development of ever-more powerful comput| Leo Gao
I describe my process of programming the board game Carcassonne in Python, and my preparations for using this game as a reinforcement learning environment.| Wingedsheep: Artificial Intelligence Blog