Loading…| Google Docs
Proposals for safe AGI systems are typically made at the level of frameworks, specifying how the components of the proposed system should be trained and interact with each other. In this paper, we model and compare the most promising AGI safety frameworks using causal influence diagrams. The diagrams show the optimization objective and causal assumptions of the framework. The unified representation permits easy comparison of frameworks and their assumptions. We hope that the diagrams will ser...| arXiv.org
How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment? We show that current approaches to penalizing side effects can introduce bad incentives, e.g. to prevent any irreversible changes in the environment, including the actions of other agents. To isolate the source of such undesirable incentives, we break down side effects penalties into two components: a baseline state and a measure of deviation from this baseline state. We argue that so...| arXiv.org
2018 progress Research / AI safety: Wrote a paper on measuring side effects using relative reachability in May, and presented the results at the ICML GoalsRL workshop and the AI safety summer schoo…| Victoria Krakovna
2017 progress Research/career: Coauthored RL with reward corruption paper and presented the results at the U Toronto CS department, Workshop on Reliable AI, and Women in ML workshop. Coauthored AI …| Victoria Krakovna
2016 progress Research / career: Got a job at DeepMind as a research scientist in AI safety. Presented MiniSPN paper at ICLR workshop. Finished RNN interpretability paper and presented at ICML and …| Victoria Krakovna
2015 progress Research: Finished paper on the Selective Bayesian Forest Classifier algorithm Made an R package for SBFC (beta) Worked at Google on unsupervised learning for the Knowledge Graph with…| Victoria Krakovna
2014 progress If someone told me at the beginning of 2014 that I would co-found an organization to mitigate technological risks to humanity, I might not have believed them. Thanks Max, Meia, Anthon…| Victoria Krakovna