After five years of writing on this blog, I have decided to move future content to a Substack. If you enjoy my writing, please subscribe there.| jacobbuckman.com
I recently came across this post, which challenges readers to propose a definition for rationality. The post is several years old so I’m clearly late to the game, and I’m sure that most everything has been said already, but I figured I’d take a crack at it anyways. In my...| jacobbuckman.com
Let me tell you a story. When I was a young man, I lived for a time as an itinerant gambler, wandering the Russian countryside to engage my fellows in games of chance. Although I was known to gamble on cards, dice, and animals, my favorite game was played with...| jacobbuckman.com
This post is a follow-up to yesterday’s essay, and relates to an ongoing discussion between Scott Alexander and Gary Marcus on the topic of AI scaling (post1, post2, post3, post4). Specifically, the debate is whether scaled-up language models in the style of GPT-3 will eventually become general intelligences, or whether...| jacobbuckman.com
by Jacob Buckman and Carles Gelada This post is part of a series on bad abstractions in machine learning. For context on why we are writing these, read Abstraction Enables Thought. Bad Abstraction: There are two types of machine learning models. Discriminative models are trained to separate inputs into classes,...| jacobbuckman.com
by Jacob Buckman and Carles Gelada In “Funes the Memorious” by Jorge Luis Borges, a fall from a horse leaves Irineo Funes paralyzed, but grants him the ability to recall everything he has ever experienced in perfect detail. But though it initially seems to be a superpower, it is revealed...| jacobbuckman.com
This week, I was thrilled to read about the first well-documented case of explicit academic fraud in the artificial intelligence community. I hope that this is the beginning of a trend, and that other researchers will be inspired by their example and follow up by engaging in even more blatant...| jacobbuckman.com
When I was fourteen, my interest in video games began to intersect with my interest in programming. I would go through the files of computer games I played in an idle attempt to gain some insight into how they worked. I mostly looked for plaintext config files in order to...| jacobbuckman.com
The replay memory, or replay buffer, has been a staple of deep reinforcement learning algorithms since DQN, where it was first introduced. In brief, a replay memory is a data structure which temporarily saves the agent’s observations, allowing our learning procedure to update on them multiple times. Although it is...| jacobbuckman.com
=== Introduction Hello friends. Today we’re going to be learning about the fundamentals of offline reinforcement learning, which is the problem of choosing how to act based on a fixed amount of data about the environment. This post assumes familiarity with the ideas around standard RL. The goal is to...| jacobbuckman.com
Many people are becoming interested in “Offline RL” these days. I’m quite interested in it, too; for the past two years, I’ve been thinking hard about this setting, together with my collaborators Carles and Marc. In that time, the community has proposed many cool algorithms and run many insightful experiments....| jacobbuckman.com
by Carles Gelada and Jacob Buckman Proponents of Bayesian neural networks often claim that trained BNNs output distributions which capture epistemic uncertainty. Epistemic uncertainty is incredibly valuable for a wide variety of applications, and we agree with the Bayesian approach in general. However, we argue that BNNs require highly informative...| jacobbuckman.com
by Carles Gelada and Jacob Buckman WARNING: This is an old version of this blogpost, and if you are a Bayesian, it might make you angry. Click here for an updated post with the same content. Context: About a month ago Carles asserted on Twitter that Bayesian Neural Networks make...| jacobbuckman.com
by Carles Gelada and Jacob Buckman Many researchers believe that model-based reinforcement learning (MBRL) is more sample-efficient that model-free reinforcement learning (MFRL). However, at a fundamental level, this claim is false. A more nuanced analysis shows that it can be the case that MBRL approaches are more sample-efficient than MFRL...| jacobbuckman.com
The dream of reinforcement learning is that it can one day be used to derive automated solutions to real-world tasks, with little-to-no human effort1. Unfortunately, in its current state, RL fails to deliver. There have been basically no real-world problems solved by DRL; even on toy problems, the solutions found...| jacobbuckman.com
In light of the recent discussions on the *ACL reviewing process on Twitter, I want to share some thoughts. Do We Need Peer Review? Specifically, do we need double-blind peer review of the sort that conferences provide? I’m in full agreement with Ryan that it is an essential service for...| jacobbuckman.com
This post is the second of a series; click here for the previous post. Naming and Scoping Naming Variables and Tensors As we discussed in Part 1, every time you call tf.get_variable(), you need to assign the variable a new, unique name. Actually, it goes deeper than that: every tensor...| jacobbuckman.com
On August 5th, OpenAI successfully defeated top human players in a Dota 2 best-of-three series. Their AI Dota agent, called OpenAI Five, was a deep neural network trained using reinforcement learning. As a researcher studying deep reinforcement learning, as well as a long-time follower of competitive Dota 2, I found...| jacobbuckman.com
The past few days have seen a back-and-forth between Scott Alexander and Gary Marcus on the topic of AI scaling (post1, post2, post3, post4). Specifically, the debate is whether scaled-up language models in the style of GPT-3 will eventually become general intelligences, or whether we will hit some fundamental limits....| jacobbuckman.com
…and it is called active learning, and it’s not very impressive. The connection is pretty simple to see. Let’s start by outlining what a “recursively self-improving AI” would look like. To start, there’s some code. It gets compiled into an executable function. This is then evaluated, giving some score or...| jacobbuckman.com