I’ve recently spent some more time thinking about speculative issues in AI safety: Ideas for building useful agents without goals: approval-directed agents, approval-directed bootstrapping, and optimization and goals. I think this line of reasoning is very promising. A formalization of one piece of the AI safety challenge: the steering problem. I am eager to see more precise, high-level discussion […]