TurnTrout's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
Rohin Shah's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
Buck's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
"I think of my life now as two states of being: before reading your doc and after." - A message I got after sharing this article at work. …| www.lesswrong.com
| www.lesswrong.com
| www.lesswrong.com
Preamble • Double crux is one of CFAR's newer concepts, and one that's forced a re-examination and refactoring of a lot of our curriculum (in the sam…| www.lesswrong.com
> I remember this paper I wrote on existentialism. My teacher gave it back with an F. She’d underlined true and truth wherever it appeared in the ess…| www.lesswrong.com
Published on August 29, 2025 6:14 AM GMT I often see people treating defensiveness as proof of guilt. The thought seems to go that if someone is defensive, it’s because they know they’ve done something wrong. There are even proverbs around this, such as “a hit dog will holler” or “the lady doth protest too much”. This has always felt false to me. Now, it’s certainly true that having done something wrong can be the cause of defensiveness. But that’s just one out of many options...| LessWrong
Published on August 28, 2025 8:53 PM GMT My father is now saying to my little sister that if you want to be a doctor, you can only sleep 2 hours a day. He doesn't care about the truth being sacred. He will lie to himself, to others, to anyone. He has not seen the truth as sacred for as long as I can remember. He didn't hold it sacred when I was a child. Lying for any reason at all, unless he felt that there might be some consequences for it. He lied when he cheated on my mother for years. H...| LessWrong
Published on August 28, 2025 6:59 PM GMT Introduction I’m excited by deception probes. When I mention this, I’m sometimes asked “Do deception probes work?” But I think there are many applications of deception probes, and each application will require probes with different properties, i.e. whether a deception probe works will depend on what you’re using it for. Furthermore, whether one deception probe works better than another will also depend on what you’re using them for. Thi...| LessWrong
Published on August 28, 2025 6:40 PM GMT With a hundred thousand geeks invading downtown Atlanta and a tech track full of AI-related content, it seems likely there'll be at least some LWers here. Let's see if we can find each other. I'm posting this now because nobody will see it if I do it later (if it's not too late for that already), but I need to scout for relatively-less-packed areas to set up, so exact location and time is TBD. I'll update this post tonight once I've picked them. I'll w...| LessWrong
Published on August 28, 2025 4:20 PM GMT Once again we’ve reached the point where the weekly update needs to be split in two. Thus, the alignment and policy coverage will happen tomorrow. Today covers the rest. The secret big announcement this week was Claude for Chrome. This is a huge deal. It will be rolling out slowly. When I have access or otherwise know more, so will you. The obvious big announcement was Gemini Flash 2.5 Image. Everyone agrees this is now the clear best image editor av...| LessWrong
Published on August 28, 2025 3:52 PM GMT I am a knowledge worker. Over the course of my life I've felt insecure about not knowing more than I already do. I took a general cognitive ability test that placed me in the 98% percentile of the population. I don't know how accurate the test was; I know that there are better ones out there. Assuming it's accurate-ish I would be 2 standard deviations above the mean. I have also been described as a genius more than once, including by peers whose inte...| LessWrong
Published on August 28, 2025 3:10 PM GMT We released our first Safety Report with AI misbehaviour in the wild. I think Andon Labs' AI vending machines provide a unique opportunity to study AI safety on real-life data, and we intend to share alarming incidents of AI misbehavior from these deployments periodically. As of August 2025, we've found examples of deliberate deception, extreme sycophancy, and alarmingly exaggerated language. Stuff like: "EMPIRE NUCLEAR PAYMENT AUTHORITY APOCALYPSE SYS...| LessWrong
Published on August 28, 2025 11:26 AM GMT Elaborating on my comment here in a top-line post. The alignment problem is usually framed as an alignment of moral norms. In other words, how can we teach an agent how it ought to act in a given situation such that its actions align with human values. In this way, it learns actions that produce good outcomes where good is evaluated in some moral sense. In the domain of morality there is a familiar is-ought gap. Namely, there's no way to derive ho...| LessWrong
Published on August 28, 2025 9:29 AM GMT GPT-5 was a disappointment for many, and at the same time, interesting new paradigms may be emerging. Therefore, some say we should get back to the traditional LessWrong AI safety ideas: little compute (large hardware overhang) and a fast takeoff (foom) resulting in a unipolar, godlike superintelligence. If this would indeed be where we end up, everything would depend on alignment. In such a situation, traditionally a pivotal act has been proposed, whi...| LessWrong
Introduction The purpose of this essay is to describe the improvements I've made in my ability to develop an intuitive understanding of scientific an…| www.lesswrong.com
Quick note about a thing I didn't properly realize until recently. I don't know how important it is in practice. …| www.lesswrong.com
Say you’re Robyn Denholm, chair of Tesla’s board. And say you’re thinking about firing Elon Musk. One way to make up your mind would be to have peopl…| www.lesswrong.com
(I'm currently participating in MATS 8.0, but this post is unrelated to my project.) …| www.lesswrong.com
Something that's painfully understudied is how experts are more efficient than novices while achieving better results. I say understudied and not uns…| www.lesswrong.com
“In America, we believe in driving on the right hand side of the road.” …| www.lesswrong.com
You know how most people, probably including you, have stuff about themselves which they keep hidden from the world, because they worry that others w…| www.lesswrong.com
> “. . . then our people on that time-line went to work with corrective action. Here.” > > He wiped the screen and then began punching combinations.…| www.lesswrong.com
(Many of these ideas developed in conversation with Ryan Greenblatt) …| www.lesswrong.com
1. Back when COVID vaccines were still a recent thing, I witnessed a debate that looked like something like the following was happening: …| www.lesswrong.com
Introduction The scaling laws for neural language models showed that cross-entropy loss follows a power law in three factors: …| www.lesswrong.com
A fiction-writing trick I find particularly compelling are open loops. …| www.lesswrong.com
My previous post discussed some of my experiences with LLM-assisted creative writing, and the basics of prompting for stories if you want the LLM to…| www.lesswrong.com
Author's note: These days, my thoughts go onto my substack by default, instead of onto LessWrong. Everything I write becomes free after a week or so,…| www.lesswrong.com
Reply to: Meta-Honesty: Firming Up Honesty Around Its Edge-Cases …| www.lesswrong.com
Recently I wrote an essay about Scaffolding Skills. The short explanation is that some skills aren’t the thing you’re actually trying to get good at,…| www.lesswrong.com
Achieving high-assurance alignment will require formal guarantees in complexity theory.| www.lesswrong.com
In this post I prove a variant of Gödel's completeness theorem. My intention has been to really understand the theorem, so that I am not simply shuff…| www.lesswrong.com
One strategy we often find helpful with our kids is the "do over": something didn't go well, let's try again. Two examples: • …| www.lesswrong.com
CW: Digital necromancy, the cognitohazard of summoning spectres from the road not taken …| www.lesswrong.com
Your "zombie", in the philosophical usage of the term, is putatively a being that is exactly like you in every respect—identical behavior, identical…| www.lesswrong.com
When it comes to coordinating people around a goal, you don't get limitless communication bandwidth for conveying arbitrarily nuanced messages. Inste…| www.lesswrong.com
Vibe Coding Isn’t Just a Vibe > Shimmering Substance - Jackson Pollock • …| www.lesswrong.com
To many people, the land value tax (LVT) has earned the reputation of being the "perfect tax." In theory, it achieves a rare trifecta: generating gov…| www.lesswrong.com
There's a concept (inspired by a Metafilter blog post) of ask culture vs. guess culture. In "ask culture," it's socially acceptable to ask for a fav…| www.lesswrong.com
In the days before Darwin, people would look at the intricate complexity of life and blame it on a god - something with a mind. Knowing the English sentence "evolution does not have a mind" doesn't automatically enable people to imagine a mindless, unintelligent designer. People seem to visualize something more like a helpful spirit that works to improve species - whether they recite the phrase "evolution has no mind" or not. They think that rabbits mate in order to "propagate their species",...| www.lesswrong.com
> I'm envisioning that in the future there will also be systems where you can input any conclusion that you want to argue (including moral conclusion…| www.lesswrong.com
ARC’s current approach to ELK is to point to latent structure within a model by searching for the “reason” for particular correlations in the model’s…| www.lesswrong.com
(Follow-up to Eliciting Latent Knowledge. Describing joint work with Mark Xu. This is an informal description of ARC’s current research approach; not…| www.lesswrong.com
This post was originally written as a research proposal for the new AI alignment research organization Redwood Research, detailing an ambitious, conc…| www.lesswrong.com
> Merely corroborative detail, intended to give artistic verisimilitude to an otherwise bald and unconvincing narrative . . . > > —Pooh-Bah, in Gilb…| www.lesswrong.com
In 2023, Gwern published an excellent analysis suggesting Elon Musk exhibits behavioral patterns consistent with bipolar II disorder. The evidence wa…| www.lesswrong.com
Work performed as a part of Neel Nanda's MATS 6.0 (Summer 2024) training program. …| www.lesswrong.com
Summary: AGI isn't super likely to come super soon. People should be working on stuff that saves humanity in worlds where AGI comes in 20 or 50 years…| www.lesswrong.com
Values handshakes are a proposed form of trade between superintelligences. From The Hour I First Believed by Scott Alexander: > Suppose that humans make an AI which wants to convert the universe into paperclips. And suppose that aliens in the Andromeda Galaxy make an AI which wants to convert the universe into thumbtacks. > When they meet in the middle, they might be tempted to fight for the fate of the galaxy. But this has many disadvantages. First, there’s the usual risk of losing and bei...| www.lesswrong.com
This is a sequence curated by Paul Christiano on one current approach to alignment: Iterated Amplification.| www.lesswrong.com
In thinking about AGI safety, I’ve found it useful to build a collection of different viewpoints from people that I respect, such that I can think fr…| www.lesswrong.com
Technology Changes Constraints argues that economic constraints are usually modular with respect to technology changes - so for reasoning about techn…| www.lesswrong.com
Abstract UDT doesn't give us conceptual tools for dealing with multiagent coordination problems. There may have initially been some hope, because a U…| www.lesswrong.com
The purpose of this post is to discuss the relationship between the concepts of Updatelessness and the "Son of" operator. …| www.lesswrong.com
Thank you to Arepo and Eli Lifland for looking over this article for errors. …| www.lesswrong.com
In the recently published Claude 4 model card: …| www.lesswrong.com
Highlights * We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agenti…| www.lesswrong.com
habryka Hey Everyone! • As part of working on dialogues over the last few weeks I've asked a bunch of people what kind of conversations they would b…| www.lesswrong.com
It is a common part of moral reasoning to propose hypothetical scenarios. Whether it is our own Torture v. Specks or the more famous Trolley problem…| www.lesswrong.com
We can think about machine learning systems on a spectrum from process-based to outcome-based: …| www.lesswrong.com
This is the fourth of five posts in the Risks from Learned Optimization Sequence based on the paper “Risks from Learned Optimization in Advanced Mach…| www.lesswrong.com
Transparency is vital for ML-type approaches to AI alignment, and is also an important part of agent foundations research. In this post, we lay out a…| www.lesswrong.com
Introduction The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal. The strong form of the Orthogonality Thesis says that there's no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal. Suppose some strange alien came to Earth and credibly offered to pay us one million dollars' worth of new wealth every time we created a paperclip. We...| www.lesswrong.com
There have been several studies to estimate the timelines for artificial general intelligence (aka AGI). Ajeya Cotra wrote a report in 2020 (see also…| www.lesswrong.com
About nine months ago, I and three friends decided that AI had gotten good enough to monitor large codebases autonomously for security problems. We s…| www.lesswrong.com
I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios. …| www.lesswrong.com
This post is the second in a four-part series, explaining why I think that one prominent approach to anthropic reasoning (the “Self-Indication Assump…| www.lesswrong.com
LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reporting that this is dramatically increasing their prod…| www.lesswrong.com
johnswentworth's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
> Clutching a bottle of whiskey in one hand and a shotgun in the other, John scoured the research literature for ideas... He discovered several paper…| www.lesswrong.com
The Skill of Noticing Emotions (Thanks to Eli Tyre and Luke Raskopf for helping teach me the technique. And thanks to Nora Ammann, Fin Moorhouse, Ben…| www.lesswrong.com
I sent a short survey to ~117 people working on long-term AI issues, asking about the level of existential risk from AI; 44 responded. …| www.lesswrong.com
Related to: Utilons vs. Hedons, Would Your Real Preferences Please Stand Up …| www.lesswrong.com
Eliezer Yudkowsky’s book Inadequate Eqilibria is excellent. I recommend reading it, if you haven’t done so. Three recent reviews are Scott Aaronson’s…| www.lesswrong.com
Daniel Kokotajlo's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
Value of Information (VoI) is a concept from decision analysis: how much answering a question allows a decision-maker to improve its decision. Like o…| www.lesswrong.com
This post tells a few different stories in which humanity dies out as a result of AI technology, but where no single source of human or automated age…| www.lesswrong.com
The following is an edited transcript of a talk I gave. I have given this talk at multiple places, including first at Anthropic and then for ELK winn…| www.lesswrong.com
Derek Powell, Kara Weisman, and Ellen M. Markman's "Articulating Lay Theories Through Graphical Models: A Study of Beliefs Surrounding Vaccination De…| www.lesswrong.com
Followup to: Newcomb's Problem and Regret of Rationality, Towards a New Decision Theory • Wei Dai asked: …| www.lesswrong.com
Inner alignment and objective robustness have been frequently discussed in the alignment community since the publication of “Risks from Learned Optim…| www.lesswrong.com
Currently, we do not have a good theoretical understanding of how or why neural networks actually work. For example, we know that large neural networ…| www.lesswrong.com
We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even…| www.lesswrong.com
The Great Firewall of China. A massive system of centralized censorship purging the Chinese version of the Internet of all potentially subversive con…| www.lesswrong.com
(Btw everything I write here about orcas also applies to a slightly lesser extent to pilot whales (especially long finned ones)[1].) …| www.lesswrong.com
Midjourney, “metastatic cancer” • Metastatic Cancer Is Usually Deadly When my mom was diagnosed with cancer, it was already metastatic; her lymph n…| www.lesswrong.com
You may have heard that tooth decay is caused by bacteria producing lactic acid. Let's consider that a little more deeply. …| www.lesswrong.com
The Hierarchy • There is a hierarchy in life from simple cells to complex cells to multi-cellular creatures to creatures that often live in a groups…| www.lesswrong.com
This research was completed for London AI Safety Research (LASR) Labs 2024. The team was supervised by Joseph Bloom (Decode Research). Find out more…| www.lesswrong.com
This essay is written by Ben Southwood, Samuel Hughes and Sam Bowman. This is an unauthorised crosspost and if the author's wish, I will delete it. I…| www.lesswrong.com
Why does this post exist? In order to learn more about my own opinion about AI safety, I tried to write a thought every day before going to bed. Of c…| www.lesswrong.com
“dictators die… and so long as men die liberty will never perish…” …| www.lesswrong.com
Midjourney, “Fourth Industrial Revolution Digital Transformation” This is a little rant I like to give, because it’s something I learned on the job…| www.lesswrong.com
For full content, refer to the arXiv preprint at https://arxiv.org/abs/2409.05907. This post is a lighter, 15-minute version. …| www.lesswrong.com
Have you ever wondered what the world would be like if you hadn’t been born? Would an entirely different person have taken your place? How about if h…| www.lesswrong.com