Topic: [2507.16806] Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty