Login
From:
www.philschmid.de
(Uncensored)
subscribe
Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial
https://www.philschmid.de/mini-deepseek-r1
links
backlinks
Reproduce Deepseek R1 „aha moment“ and train an open model using reinforcement learning trying to teach it self-verification and search abilities all on its own to solve the Countdown Game.
Roast topics
Find topics
Roast it!
Roast topics
Find topics
Find it!
Roast topics
Find topics
Find it!