Making alignment via RLHF more scalable by automating human feedback...| cameronrwolfe.substack.com
Understanding how SFT works from the idea to a working implementation...| cameronrwolfe.substack.com