Login
From:
bdtechtalks.substack.com
(Uncensored)
subscribe
DeepSeek's new reward model takes RL to open-domain tasks
https://bdtechtalks.substack.com/p/deepseeks-new-reward-model-takes
links
backlinks
Generative reward modeling uses principles and critiques to help LLMs to learn reasoning about tasks without explicit ground-truth signals
Roast topics
Find topics
Find it!