Published on September 29, 2025 6:31 AM GMT Expanded and generalized version of this shortform Motivation: Typing Fast Part of my writing process involves getting words out of my head and into a text editor as quickly as I can manage. Sometimes this involves Loom, but most of the time it is just good old fashion babbling and brainstorming. When I'm doing this, I'm trying to use the text editor as extra working memory/RAM/reasoning tokens/etc. so that I can use my actual brain's working memory...| LessWrong
Published on September 29, 2025 4:09 AM GMT A few months ago, I asked Google and Apple for a data takeout. You should too! It’s easy. Google sent me a download link within half an hour; Apple took its time, three days to be exact. I requested roughly a third of 60+ available categories from GooglePast reservations (Google); IP history (Apple)Apple notes with metadataI spent hours wandering down the memory lanes and found myself repeatedly marveling, “I can’t believe they keep track of t...| LessWrong
Published on September 29, 2025 4:05 AM GMT When I talk to friends, colleagues, and internet strangers about the risk of ASI takeover, I find that many people have misconceptions about where the dangers come from or how they might be mitigated. A lot of these misconceptions are rooted in misunderstanding how today’s AI systems work and are developed. This article is an attempt to explain the risk of ASI misalignment in a way that makes the dangers and difficulties clear, rooted in examples ...| LessWrong
Published on September 29, 2025 4:01 AM GMT Pardon the snazzy picture.Part 1: This is an exploration of game theory mechanics as an alternative alignment approach, looking at current AI alignment methods and informed by the work of legal scholars Goldstein and Salib, Turing Award-winner Yoshua Bengio, and the latest research. --- The September 17, 2025 Report The latest AI behavior "report card" is here from OpenAI and Apollo Research. Did four modern LLMs get a gold star, or are they going t...| LessWrong
Published on September 28, 2025 9:36 PM GMT Book review: If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All, by Eliezer Yudkowsky, and Nate Soares. [This review is written (more than my usual posts) with a Goodreads audience in mind. I will write a more LessWrong-oriented post with a more detailed description of the ways in which the book looks overconfident.] If you're not at least mildly worried about AI, Part 1 of this book is essential reading. Please read If Anyone B...| LessWrong
Published on September 28, 2025 5:34 PM GMT I was hoping to write a full review of "If Anyone Builds It, Everyone Dies" (IABIED Yudkowsky and Soares) but realized I won't have time to do it. So here are my quick impressions/responses to IABIED. I am writing this rather quickly and it's not meant to cover all arguments in the book, nor to discuss all my views on AI alignment; see six thoughts on AI safety and Machines of Faithful Obedience for some of the latter. First, I like that the book ...| LessWrong
Published on September 28, 2025 4:54 PM GMT BIRDS!Let’s say you’re a zoo architect tasked with designing an enclosure for ostriches, and let’s also say that you have no idea what an ostrich is (roll with me here). The potentially six-figure question staring you down is whether to install a ceiling. The dumb solution is to ask “Are ostriches birds?” then, surmising that birds typically fly, construct an elaborate aviary complete with netting and elevated perches. The non-dumb solutio...| LessWrong
Published on September 28, 2025 3:34 PM GMT An extended version of this article was given as my keynote speech at the 2025 LessWrong Community Weekend in Berlin. A couple of years ago, I agreed to give a talk on the topic of psychology. I said yes, which of course meant that I now had a problem. Namely, I had promised to give a talk, but I did not have one prepared. (You could also more optimistically call this an opportunity, a challenge, a quest, etc.) So I decided to sit down at my compute...| LessWrong
Published on September 28, 2025 1:48 PM GMT Summary This post describes how we organized the Finnish Alignment Engineering Bootcamp, a 6-week technical AI safety bootcamp for 12 people. The bootcamp was created jointly with the Finnish Center for Safe AI (Tutke) and Effective Altruism (EA) Finland. It was composed of five weeks of remote learning based on the ARENA curriculum and a one-week on-site research sprint. We provide extensive details of our work and lessons learned along the way. ...| LessWrong
When I see the hunger strikes in front of offices of openAI and anthropic, Or the fellowships and think tanks sprouting around the world, all aimed a…| www.lesswrong.com
Related: Book Review: On the Edge: The Gamblers I have previously been heavily involved in sports betting. That world was very good to me. The times…| www.lesswrong.com
Reply to: Decoupling vs Contextualising Norms …| www.lesswrong.com
People go funny in the head when talking about politics. The evolutionary reasons for this are so obvious as to be worth belaboring: In the ancestral…| www.lesswrong.com
One particularly thorny, but very frequent way for a discussion to become derailed is for participants have drastically different beliefs as to what the scope of the discussion ought to be, whilst simultaneously being unable or unwilling to compromise...| www.lesswrong.com
“Optical rectennas” (or sometimes “nantennas”) are a technology that is sometimes advertised as a path towards converting solar energy to electricity with higher efficiency than normal solar cells...| www.lesswrong.com
Since free will is about as easy as a philosophical problem in reductionism can get, while still appearing "impossible" to at least some philosophers, it makes a good exercise for aspiring reductionists, which they should try on their own - see the main page on free will. These posts should not be read until having made a very serious effort on your own. Related Pages: Consciousness, Free Will, Philosophy, Reductionism, Blog posts (Items in italics can be skipped if in a rush.) * How An Alg...| www.lesswrong.com
This is a sequence version of the paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper. Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, and Joar Skalse contributed equally to this sequence. The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such...| www.lesswrong.com
TurnTrout's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
Rohin Shah's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
Buck's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
"I think of my life now as two states of being: before reading your doc and after." - A message I got after sharing this article at work. …| www.lesswrong.com
> I remember this paper I wrote on existentialism. My teacher gave it back with an F. She’d underlined true and truth wherever it appeared in the ess…| www.lesswrong.com
Introduction The purpose of this essay is to describe the improvements I've made in my ability to develop an intuitive understanding of scientific an…| www.lesswrong.com
Quick note about a thing I didn't properly realize until recently. I don't know how important it is in practice. …| www.lesswrong.com
Say you’re Robyn Denholm, chair of Tesla’s board. And say you’re thinking about firing Elon Musk. One way to make up your mind would be to have peopl…| www.lesswrong.com
(I'm currently participating in MATS 8.0, but this post is unrelated to my project.) …| www.lesswrong.com
“In America, we believe in driving on the right hand side of the road.” …| www.lesswrong.com
You know how most people, probably including you, have stuff about themselves which they keep hidden from the world, because they worry that others w…| www.lesswrong.com
1. Back when COVID vaccines were still a recent thing, I witnessed a debate that looked like something like the following was happening: …| www.lesswrong.com
Introduction The scaling laws for neural language models showed that cross-entropy loss follows a power law in three factors: …| www.lesswrong.com
A fiction-writing trick I find particularly compelling are open loops. …| www.lesswrong.com
My previous post discussed some of my experiences with LLM-assisted creative writing, and the basics of prompting for stories if you want the LLM to…| www.lesswrong.com
Author's note: These days, my thoughts go onto my substack by default, instead of onto LessWrong. Everything I write becomes free after a week or so,…| www.lesswrong.com
Reply to: Meta-Honesty: Firming Up Honesty Around Its Edge-Cases …| www.lesswrong.com
Recently I wrote an essay about Scaffolding Skills. The short explanation is that some skills aren’t the thing you’re actually trying to get good at,…| www.lesswrong.com
Achieving high-assurance alignment will require formal guarantees in complexity theory.| www.lesswrong.com
In this post I prove a variant of Gödel's completeness theorem. My intention has been to really understand the theorem, so that I am not simply shuff…| www.lesswrong.com
One strategy we often find helpful with our kids is the "do over": something didn't go well, let's try again. Two examples: • …| www.lesswrong.com
CW: Digital necromancy, the cognitohazard of summoning spectres from the road not taken …| www.lesswrong.com
Your "zombie", in the philosophical usage of the term, is putatively a being that is exactly like you in every respect—identical behavior, identical…| www.lesswrong.com
When it comes to coordinating people around a goal, you don't get limitless communication bandwidth for conveying arbitrarily nuanced messages. Inste…| www.lesswrong.com
Vibe Coding Isn’t Just a Vibe > Shimmering Substance - Jackson Pollock • …| www.lesswrong.com
To many people, the land value tax (LVT) has earned the reputation of being the "perfect tax." In theory, it achieves a rare trifecta: generating gov…| www.lesswrong.com
There's a concept (inspired by a Metafilter blog post) of ask culture vs. guess culture. In "ask culture," it's socially acceptable to ask for a fav…| www.lesswrong.com
Introduction The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal. The strong form of the Orthogonality Thesis says that there's no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal. Suppose some strange alien came to Earth and credibly offered to pay us one million dollars' worth of new wealth every time we created a paperclip. We...| www.lesswrong.com
There have been several studies to estimate the timelines for artificial general intelligence (aka AGI). Ajeya Cotra wrote a report in 2020 (see also…| www.lesswrong.com
I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios. …| www.lesswrong.com
This post is the second in a four-part series, explaining why I think that one prominent approach to anthropic reasoning (the “Self-Indication Assump…| www.lesswrong.com
LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reporting that this is dramatically increasing their prod…| www.lesswrong.com
johnswentworth's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
> Clutching a bottle of whiskey in one hand and a shotgun in the other, John scoured the research literature for ideas... He discovered several paper…| www.lesswrong.com
The Skill of Noticing Emotions (Thanks to Eli Tyre and Luke Raskopf for helping teach me the technique. And thanks to Nora Ammann, Fin Moorhouse, Ben…| www.lesswrong.com
I sent a short survey to ~117 people working on long-term AI issues, asking about the level of existential risk from AI; 44 responded. …| www.lesswrong.com
Related to: Utilons vs. Hedons, Would Your Real Preferences Please Stand Up …| www.lesswrong.com
Eliezer Yudkowsky’s book Inadequate Eqilibria is excellent. I recommend reading it, if you haven’t done so. Three recent reviews are Scott Aaronson’s…| www.lesswrong.com
Daniel Kokotajlo's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
Value of Information (VoI) is a concept from decision analysis: how much answering a question allows a decision-maker to improve its decision. Like o…| www.lesswrong.com
This post tells a few different stories in which humanity dies out as a result of AI technology, but where no single source of human or automated age…| www.lesswrong.com
The following is an edited transcript of a talk I gave. I have given this talk at multiple places, including first at Anthropic and then for ELK winn…| www.lesswrong.com
Derek Powell, Kara Weisman, and Ellen M. Markman's "Articulating Lay Theories Through Graphical Models: A Study of Beliefs Surrounding Vaccination De…| www.lesswrong.com
Inner alignment and objective robustness have been frequently discussed in the alignment community since the publication of “Risks from Learned Optim…| www.lesswrong.com
Currently, we do not have a good theoretical understanding of how or why neural networks actually work. For example, we know that large neural networ…| www.lesswrong.com
We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even…| www.lesswrong.com
The Great Firewall of China. A massive system of centralized censorship purging the Chinese version of the Internet of all potentially subversive con…| www.lesswrong.com
(Btw everything I write here about orcas also applies to a slightly lesser extent to pilot whales (especially long finned ones)[1].) …| www.lesswrong.com
Midjourney, “metastatic cancer” • Metastatic Cancer Is Usually Deadly When my mom was diagnosed with cancer, it was already metastatic; her lymph n…| www.lesswrong.com
ARC has released a paper on Backdoor defense, learnability and obfuscation in which we study a formal notion of backdoors in ML models. Part of our m…| www.lesswrong.com
> "Would you kill babies if it was the right thing to do? If no, under what circumstances would you not do the right thing to do? If yes, how right…| www.lesswrong.com
Why do I believe that the Sun will rise tomorrow? • Because I've seen the Sun rise on thousands of previous days. …| www.lesswrong.com
In this post, we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that the safety measures they appl…| www.lesswrong.com
Everyone who starts thinking about AI starts thinking big. Alan Turing predicted that machine intelligence would ma…| www.lesswrong.com
[This is the text of the sermon given by Pastor James Windrow on Sunday, July 14, 2024.] …| www.lesswrong.com
Things went very wrong on Friday. • A bugged CrowdStrike update temporarily bricked quite a lot of computers, bringing down such fun things as airlin…| www.lesswrong.com
This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from…| www.lesswrong.com
Nate Soares argues that one of the core problems with AI alignment is that an AI system's capabilities will likely generalize to new domains much fas…| www.lesswrong.com
(Disclaimers: I work in the financial industry, though not in a way related to prediction markets. Anything I write here is my opinion and not that o…| www.lesswrong.com
There are two sealed boxes up for auction, box A and box B. One and only one of these boxes contains a valuable diamond. There are all manner of sign…| www.lesswrong.com
This post relates an observation I've made in my work with GPT-2, which I have not seen made elsewhere. …| www.lesswrong.com
Douglas Hubbard’s How to Measure Anything is one of my favorite how-to books. I hope this summary inspires you to buy the book; it’s worth it. …| www.lesswrong.com
Gradient hacking is when a deceptively aligned AI deliberately acts to influence how the training process updates it. For example, it might try to be…| www.lesswrong.com
> When you surround the enemy > > Always allow them an escape route. > > They must see that there is > > An alternative to death. > > —Sun Tzu, T…| www.lesswrong.com
TL;DR: We give a threat model literature review, propose a categorization and describe a consensus threat model from some of DeepMind's AGI safety te…| www.lesswrong.com
Complexity of Value is the thesis that human values have high Kolmogorov complexity; that our preferences, the things we care about, cannot be summed by a few simple rules, or compressed. Fragility of value is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable (just like dialing nine out of ten phone digits correctly does not connect you to a person 90% similar to your friend). For example, all...| www.lesswrong.com
Roko’s basilisk is a thought experiment proposed in 2010 by the user Roko on the Less Wrong community blog. Roko used ideas in decision theory to argue that a sufficiently powerful AI agent would have an incentive to torture anyone who imagined the agent but didn't work to bring the agent into existence. The argument was called a "basilisk" --named after the legendary reptile who can cause death with a single glance--because merely hearing the argument would supposedly put you at risk of to...| www.lesswrong.com
Sorry if this isn't the kind of content people want to see here. It's my regular blogging platform, so it's where I go by default when I have somethi…| www.lesswrong.com
Thanks to Ian McKenzie and Nicholas Dupuis, collaborators on a related project, for contributing to the ideas and experiments discussed in this post…| www.lesswrong.com
Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to. Related: Friendly AI, Metaethics Sequence, Complexity of Value > In calculating CEV, an AI woul...| www.lesswrong.com
We are no longer accepting submissions. We'll get in touch with winners and make a post about winning proposals sometime in the next month. …| www.lesswrong.com
I sent a two-question survey to ~117 people working on long-term AI risk, asking about the level of existential risk from "humanity not doing enough…| www.lesswrong.com
If I had to pick a single statement that relies on more Overcoming Bias content I've written than any other, that statement would be: …| www.lesswrong.com