Comment by cousin_it - Your arbitration oracle seems equivalent to the consistent guessing problem described by Scott Aaronson here. Also see the comment from Andy D proving that it's indeed strictly simpler than the halting problem.| www.lesswrong.com
Comment by Viliam - A crazy idea, I wonder if someone tried it: "All illegal drugs should be legal, if you buy them at a special government-managed shop, under the condition that you sign up for several months of addiction treatment." The idea is that drug addicts get really short-sighted and willing to do anything when they miss the drug. Typically that pushes them to crime (often encouraged by the dealers: "hey, if you don't have cash, why don't you just steal something from the shop over t...| www.lesswrong.com
Paul Christiano paints a vivid and disturbing picture of how AI could go wrong, not with sudden violent takeover, but through a gradual loss of human…| www.lesswrong.com
When I see the hunger strikes in front of offices of openAI and anthropic, Or the fellowships and think tanks sprouting around the world, all aimed a…| www.lesswrong.com
Related: Book Review: On the Edge: The Gamblers I have previously been heavily involved in sports betting. That world was very good to me. The times…| www.lesswrong.com
Reply to: Decoupling vs Contextualising Norms …| www.lesswrong.com
People go funny in the head when talking about politics. The evolutionary reasons for this are so obvious as to be worth belaboring: In the ancestral…| www.lesswrong.com
One particularly thorny, but very frequent way for a discussion to become derailed is for participants have drastically different beliefs as to what the scope of the discussion ought to be, whilst simultaneously being unable or unwilling to compromise...| www.lesswrong.com
“Optical rectennas” (or sometimes “nantennas”) are a technology that is sometimes advertised as a path towards converting solar energy to electricity with higher efficiency than normal solar cells...| www.lesswrong.com
TurnTrout's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
Rohin Shah's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
Buck's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
"I think of my life now as two states of being: before reading your doc and after." - A message I got after sharing this article at work. …| www.lesswrong.com
> I remember this paper I wrote on existentialism. My teacher gave it back with an F. She’d underlined true and truth wherever it appeared in the ess…| www.lesswrong.com
Introduction The purpose of this essay is to describe the improvements I've made in my ability to develop an intuitive understanding of scientific an…| www.lesswrong.com
Quick note about a thing I didn't properly realize until recently. I don't know how important it is in practice. …| www.lesswrong.com
(I'm currently participating in MATS 8.0, but this post is unrelated to my project.) …| www.lesswrong.com
“In America, we believe in driving on the right hand side of the road.” …| www.lesswrong.com
You know how most people, probably including you, have stuff about themselves which they keep hidden from the world, because they worry that others w…| www.lesswrong.com
1. Back when COVID vaccines were still a recent thing, I witnessed a debate that looked like something like the following was happening: …| www.lesswrong.com
Introduction The scaling laws for neural language models showed that cross-entropy loss follows a power law in three factors: …| www.lesswrong.com
A fiction-writing trick I find particularly compelling are open loops. …| www.lesswrong.com
My previous post discussed some of my experiences with LLM-assisted creative writing, and the basics of prompting for stories if you want the LLM to…| www.lesswrong.com
Author's note: These days, my thoughts go onto my substack by default, instead of onto LessWrong. Everything I write becomes free after a week or so,…| www.lesswrong.com
Reply to: Meta-Honesty: Firming Up Honesty Around Its Edge-Cases …| www.lesswrong.com
Recently I wrote an essay about Scaffolding Skills. The short explanation is that some skills aren’t the thing you’re actually trying to get good at,…| www.lesswrong.com
Achieving high-assurance alignment will require formal guarantees in complexity theory.| www.lesswrong.com
When it comes to coordinating people around a goal, you don't get limitless communication bandwidth for conveying arbitrarily nuanced messages. Inste…| www.lesswrong.com
Introduction The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal. The strong form of the Orthogonality Thesis says that there's no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal. Suppose some strange alien came to Earth and credibly offered to pay us one million dollars' worth of new wealth every time we created a paperclip. We...| www.lesswrong.com
There have been several studies to estimate the timelines for artificial general intelligence (aka AGI). Ajeya Cotra wrote a report in 2020 (see also…| www.lesswrong.com
I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios. …| www.lesswrong.com
This post is the second in a four-part series, explaining why I think that one prominent approach to anthropic reasoning (the “Self-Indication Assump…| www.lesswrong.com
LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reporting that this is dramatically increasing their prod…| www.lesswrong.com
johnswentworth's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
> Clutching a bottle of whiskey in one hand and a shotgun in the other, John scoured the research literature for ideas... He discovered several paper…| www.lesswrong.com
The Skill of Noticing Emotions (Thanks to Eli Tyre and Luke Raskopf for helping teach me the technique. And thanks to Nora Ammann, Fin Moorhouse, Ben…| www.lesswrong.com
I sent a short survey to ~117 people working on long-term AI issues, asking about the level of existential risk from AI; 44 responded. …| www.lesswrong.com
Related to: Utilons vs. Hedons, Would Your Real Preferences Please Stand Up …| www.lesswrong.com
Eliezer Yudkowsky’s book Inadequate Eqilibria is excellent. I recommend reading it, if you haven’t done so. Three recent reviews are Scott Aaronson’s…| www.lesswrong.com
Daniel Kokotajlo's profile on LessWrong — A community blog devoted to refining the art of rationality| www.lesswrong.com
This post tells a few different stories in which humanity dies out as a result of AI technology, but where no single source of human or automated age…| www.lesswrong.com
Derek Powell, Kara Weisman, and Ellen M. Markman's "Articulating Lay Theories Through Graphical Models: A Study of Beliefs Surrounding Vaccination De…| www.lesswrong.com
Inner alignment and objective robustness have been frequently discussed in the alignment community since the publication of “Risks from Learned Optim…| www.lesswrong.com
Currently, we do not have a good theoretical understanding of how or why neural networks actually work. For example, we know that large neural networ…| www.lesswrong.com
We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even…| www.lesswrong.com
(Btw everything I write here about orcas also applies to a slightly lesser extent to pilot whales (especially long finned ones)[1].) …| www.lesswrong.com
Midjourney, “metastatic cancer” • Metastatic Cancer Is Usually Deadly When my mom was diagnosed with cancer, it was already metastatic; her lymph n…| www.lesswrong.com
ARC has released a paper on Backdoor defense, learnability and obfuscation in which we study a formal notion of backdoors in ML models. Part of our m…| www.lesswrong.com
> "Would you kill babies if it was the right thing to do? If no, under what circumstances would you not do the right thing to do? If yes, how right…| www.lesswrong.com
Why do I believe that the Sun will rise tomorrow? • Because I've seen the Sun rise on thousands of previous days. …| www.lesswrong.com
This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from…| www.lesswrong.com
Nate Soares argues that one of the core problems with AI alignment is that an AI system's capabilities will likely generalize to new domains much fas…| www.lesswrong.com
(Disclaimers: I work in the financial industry, though not in a way related to prediction markets. Anything I write here is my opinion and not that o…| www.lesswrong.com
This post relates an observation I've made in my work with GPT-2, which I have not seen made elsewhere. …| www.lesswrong.com
Gradient hacking is when a deceptively aligned AI deliberately acts to influence how the training process updates it. For example, it might try to be…| www.lesswrong.com
TL;DR: We give a threat model literature review, propose a categorization and describe a consensus threat model from some of DeepMind's AGI safety te…| www.lesswrong.com
Complexity of Value is the thesis that human values have high Kolmogorov complexity; that our preferences, the things we care about, cannot be summed by a few simple rules, or compressed. Fragility of value is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable (just like dialing nine out of ten phone digits correctly does not connect you to a person 90% similar to your friend). For example, all...| www.lesswrong.com
Roko’s basilisk is a thought experiment proposed in 2010 by the user Roko on the Less Wrong community blog. Roko used ideas in decision theory to argue that a sufficiently powerful AI agent would have an incentive to torture anyone who imagined the agent but didn't work to bring the agent into existence. The argument was called a "basilisk" --named after the legendary reptile who can cause death with a single glance--because merely hearing the argument would supposedly put you at risk of to...| www.lesswrong.com
Sorry if this isn't the kind of content people want to see here. It's my regular blogging platform, so it's where I go by default when I have somethi…| www.lesswrong.com
Thanks to Ian McKenzie and Nicholas Dupuis, collaborators on a related project, for contributing to the ideas and experiments discussed in this post…| www.lesswrong.com
Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to. Related: Friendly AI, Metaethics Sequence, Complexity of Value > In calculating CEV, an AI woul...| www.lesswrong.com
We are no longer accepting submissions. We'll get in touch with winners and make a post about winning proposals sometime in the next month. …| www.lesswrong.com
I sent a two-question survey to ~117 people working on long-term AI risk, asking about the level of existential risk from "humanity not doing enough…| www.lesswrong.com
If I had to pick a single statement that relies on more Overcoming Bias content I've written than any other, that statement would be: …| www.lesswrong.com