Reproduce Deepseek R1 „aha moment“ and train an open model using reinforcement learning trying to teach it self-verification and search abilities all on its own to solve the Countdown Game.| www.philschmid.de