A rough plan for AI alignment assuming short timelines| adamjones.me
The AI regulator’s toolbox: A list of concrete AI governance practices| adamjones.me
AI could bring significant rewards to its creators. However, the average person seems to have wildly inaccurate intuitions about the scale of these rewards. By exploring some conservative estimates of the potential rewards AI companies could expect to see from the automation of human labour, this article tries to convey a grounded sense of ‘woah, this could […]| BlueDot Impact
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique for steering large language models (LLMs) toward desired behaviours. However, relying on simple human feedback doesn’t work for tasks that are too complex for humans to accurately judge at the scale needed to train AI models. Scalable oversight techniques attempt to address this […]| BlueDot Impact
AI systems already pose many significant existing risks including harmful malfunctions, discrimination, reducing social connection, invasions of privacy and disinformation. Training and deploying AI systems can also involve copyright infringement and worker exploitation. Future AI systems could exacerbate anticipated catastrophic risks, including bioterrorism, misuse of concentrated power, nuclear and conventional war. We might also gradually […]| BlueDot Impact