Login
From:
www.philschmid.de
(Uncensored)
subscribe
Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial
https://www.philschmid.de/mini-deepseek-r1
links
backlinks
Reproduce Deepseek R1 „aha moment“ and train an open model using reinforcement learning trying to teach it self-verification and search abilities all on its own to solve the Countdown Game.
Roast topics
Find topics
Find it!