12 week online course, covering a range of policy levers for steering AI development. By taking this course, you’ll learn about the risks arising from future AI systems, and proposed governance interventions to address them. You’ll consider interactions between AI and biosecurity, cybersecurity and defence capabilities, and the disempowerment of human decision-makers. We’ll also provide an overview of open technical questions such as the control and alignment problems – which posit th...| BlueDot Impact
We’re experimenting with publishing more of our internal thoughts publicly. This piece may be less polished than our normal blog articles. Running AI Safety Fundamentals’ AI alignment and AI governance courses, we often have difficulty finding resources that hit our learning objectives well. Where we can find resources, often they’re not focused on what we want, or are hard for […]| BlueDot Impact
AI could bring significant rewards to its creators. However, the average person seems to have wildly inaccurate intuitions about the scale of these rewards. By exploring some conservative estimates of the potential rewards AI companies could expect to see from the automation of human labour, this article tries to convey a grounded sense of ‘woah, this could […]| BlueDot Impact
As we open up applications for our Mid 2024 AI Safety Fundamentals AI governance course, I wanted to help prospective applicants understand how to maximise their chances of success. On reviewing the data from our previous application round,[[1]] I was surprised to find that just 4 mistakes account for 92% of rejected AI governance applicants: […]| BlueDot Impact
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique for steering large language models (LLMs) toward desired behaviours. However, relying on simple human feedback doesn’t work for tasks that are too complex for humans to accurately judge at the scale needed to train AI models. Scalable oversight techniques attempt to address this […]| BlueDot Impact
This article explains key concepts that come up in the context of AI alignment. These terms are only attempts at gesturing at the underlying ideas, and the ideas are what is important. There is no strict consensus on which name should correspond to which idea, and different people use the terms differently.[[1]] This article explains […]| BlueDot Impact
AI systems already pose many significant existing risks including harmful malfunctions, discrimination, reducing social connection, invasions of privacy and disinformation. Training and deploying AI systems can also involve copyright infringement and worker exploitation. Future AI systems could exacerbate anticipated catastrophic risks, including bioterrorism, misuse of concentrated power, nuclear and conventional war. We might also gradually […]| BlueDot Impact