Glossary Basic objects Variable A symbol to be evaluated Combination A pair to be evaluated. Basically (a . b)? Operator The unevaluated car of a combination. So pretty much the function of an s-expr? Operand tree The unevaluated cdr of a combination Operands The operand tree is usually a list, e.g. (+ 1 2 3 4), in which case any element of that list is an operand Arguments The results of evaluating operands - this is the usual case, where the evaluated values are used, rather than the operan...| ahiru.pl
Agent foundations Embedded agents, part 1 (Demski and Garrabrant, 2018)| ahiru.pl
Week 6 of the AI alignment curriculum. Interpretability is the study of ways to, well, interpret AI models, currently mainly NNs. Mechanistic interpretability This aims to understand networks on the level of individual neurons. Zoom In: an introduction to circuits (Olah et al., 2020) Claims Features are the fundamental unit of neural networks. They correspond to directions. These features can be rigorously studied and understood. Features are connected by weights, forming circuits.| ahiru.pl
Week 4 of the AI alignment curriculum. Scalable oversight refers to methods that enable humans to oversee AI systems that are solving tasks too complicated for a single human to evaluate. Basically divide and conquer. AI alignment landscape (Christiano, 2020) Intent alignment -> getting AIs to want to do what you want them to do Paul isn’t focused on reliability (how often it makes mistakes**, hopes it’ll get better along with capabilities Well meaning !| ahiru.pl
Week 3 of the AI alignment curriculum. Goal misgeneralization is scenarios in which agents in new situations generalize to behaving in competent yet undesirable ways because of learning the wrong goals from previous training. Goal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct Goals (Shah, 2022) Blog post A correct specification is needed for the learner to have the right context (so it doesn’t exploit bugs), but doesn’t automatically result in correct goals If ...| ahiru.pl
Week 2 of the AI alignment curriculum. Reward misspecification occurs when RL agents are rewarded for misbehaving. Specification gaming: the flip side of AI ingenuity (Krakovna et al., 2020) Specification gaming is a behaviour that satisfies the literal specification of an objective without achieving the intended outcome. Evil genies Amounts to both the old and new understanding of hacking RLHF can help, but only if the correct reward function is learned The map is not the territory; agents l...| ahiru.pl
Questions how long till models start autofeedbacking, like AlphaZero did? This suggests that it’s already happening the general timeline seems to be AGI around 2050 - how accurate is that? Core readings Four Background Claims - (Soares, 2015) General intelligence is a thing, and humans have it The alternative view is that intelligence is just a collection of useful modules (speech, dexterity, etc.) that can be used in different contexts to solve stuff.| ahiru.pl
How to order stuff Vanilla NNs with fully connected layers work by each neuron receiving all the outputs of the previous layer (or the inputs in the case of input layers) and then doing calculations over them and the weights and biases of the given neuron. This works well in general, but is especially good when each input value has a specific meaning, e.g. [, , , .| ahiru.pl
Convolution is pretty much just sliding 2 lists along each other and summing their element wise multiplication. The general equation for the $t^{th}$ result is: $$(f*g)(t) = \int^{\infty}_{-\infty}f(\tau)g(t-\tau)d\tau$$ Properties Convolution has a couple of interesting properties, which boil down to it being commutative: $f * g = g * f$ - I make use of this in the convolve2() function, as it makes things easier for me to understand $\int (f * g) = \int g \cdot \int f$.| ahiru.pl
Sigmoid ($\sigma(x) = \frac {1}{1 - e^{-z}}$) Can saturate when $z$ is large Has a nice derivative: $\sigma’(x) = \sigma(x)(1 - \sigma(x)) $ Transforms $(-\infty; \infty) \to (0, 1)$ Tanh ($\tanh(z) = \frac {e^z - e^{-z}}{e^z + e^{-z}}$) Is a rescaled version of the sigmoid: $\sigma(z)=\frac {1 + \tanh(\frac z 2)}{2}$ Transforms $(-\infty; \infty) \to (-1, 1)$, so is zero centered May require normalization of outputs (or even inputs) to a prob distribution $\tanh’(z) = 1 - \tanh^2(z)$ Is ...| ahiru.pl
Hints when starting on a new problem Start by getting better than change results, as a baseline for improvements Strip the problem space down to a simpler version, e.g. just learn to classify 0 and 1, rather than all the digits of MNIST Focus on getting decent values hyperparameters one by one (e.g. $\lambda$ or $\eta$), rather than randomly jumping around hyperparameter space Start with getting decent learning rates etc. before scaling up the number of neurons Initially jump about by largish...| ahiru.pl
Notes from chapter 3 of Neural Networks and Deep Learning Measurements of fit From Neural Networks and Deep Learning, 4 measures of the same data to check how good the fit is: Here accuracy on the test data plateaus around epoch 280, while the training data cost keeps going down smoothly. On the other hand, the cost for the test data starts going up around epoch 15, which is more or less the same point that the accuracy on the training data stops drastically improving.| ahiru.pl
Quadratic cost function ($C = \frac {(y-a)^2}{2}$) This is nice and simple, with the additional bonus that $C’ = y - a$ The loss grows exponentially, so a large error is treated a lot more harshly than a small error - this seems a good idea It’s very xenophobic, in that it will go out of its way to harm outliers The exponent will cause the loss to always be positive.| ahiru.pl
The basic equations Given the following notation: $w_{jk}^l$ are the weights from the $k^{th}$ neuron in the $l^{n-1}$ layer to the $j^{th}$ neuron of the $l^{th}$ layer $a_j^l$ is the activation of the $j^{th}$ neuron in the $l^{th}$ layer $b_j^l$ is the bias of the $j^{th}$ neuron in the $l^{th}$ layer $z^l$ is $a^{l-1}w^l + b^l$ for the $l^{th}$ layer $z_j^l$ is $\sum_ja_j^{l-1}w_{jk}^l + b_j^l$ for the $j^{th}$ neuron in the $l^{th}$ layer $\sigma$ is the activation function used $C$ is t...| ahiru.pl
Stuff I think about which I want to have written down for posterity to appreciate the heights of my intellect etc. I started this after getting my own domain, mainly to have ahiru.pl actually resolve to something. There isn’t a specific topic that I focus on - I write about whatever I’ve been thinking about lately, but also use it as a place to collect good blog posts, notes, anki collections and other such stuff.| ahiru.pl