In a recent paper in Annals of Mathematics and Philosophy, Fields medalist Timothy Gowers asks why mathematicians sometimes believe that unproved statements are likely to be true. For example, it is unknown whether \(\pi\) is a normal number (which, roughly speaking, means that every digit appears in \(\pi\) with equal| Alignment Research Center
Over the last few months, ARC has released a number of pieces of research. While some of these can be independently motivated, there is also a more unified research vision behind them. The purpose of this post is to try to convey some of that vision and how our individual| Alignment Research Center
Last week, ARC released a paper called Towards a Law of Iterated Expectations for Heuristic Estimators, which follows up on previous work on formalizing the presumption of independence. Most of the work described here was done in 2023. A brief table of contents for this post: What is a heuristic| Alignment Research Center
ARC's current research focus can be thought of as trying to combine mechanistic interpretability and formal verification. If we had a deep understanding of what was going on inside a neural network, we would hope to be able to use that understanding to verify that the network was not going| Alignment Research Center