Topic: [2507.12856] Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)