“Legible vs. Illegible AI Safety Problems” by Wei Dai

5.11.2025

LessWrong (Curated & Popular)

0:00

3:29

Some AI safety problems are legible (obvious or understandable) to company leaders and government policymakers, implying they are unlikely to deploy or allow deployment of an AI while those problems remain open (i.e., appear unsolved according to the information they have access to). But some problems are illegible (obscure or hard to understand, or in a common cognitive blind spot), meaning there is a high risk that leaders and policymakers will decide to deploy or allow deployment even if they are not solved. (Of course, this is a spectrum, but I am simplifying it to a binary for ease of exposition.)

From an x-risk perspective, working on highly legible safety problems has low or even negative expected value. Similar to working on AI capabilities, it brings forward the date by which AGI/ASI will be deployed, leaving less time to solve the illegible x-safety problems. In contrast, working on the illegible problems (including by trying to make them more legible) does not have this issue and therefore has a much higher expected value (all else being equal, such as tractability). Note that according to this logic, success in making an illegible problem highly legible is almost as good as solving [...]

The original text contained 2 footnotes which were omitted from this narration.

---

First published:
November 4th, 2025

Source:
https://www.lesswrong.com/posts/PMc65HgRFvBimEpmJ/legible-vs-illegible-ai-safety-problems

---

Narrated by TYPE III AUDIO.

Flere episoder fra "LessWrong (Curated & Popular)"

Flere episoder

Få adgang til hele det store podcastunivers med gratisappen GetPodcast.

Abonnér på dine favoritpodcasts, lyt til episoder offline, og få spændende anbefalinger.

“Legible vs. Illegible AI Safety Problems” by Wei Dai

LessWrong (Curated & Popular)

Flere episoder fra "LessWrong (Curated & Popular)"

“Sonnet 4.5’s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals” by Alexa Pan, ryan_greenblatt

“Publishing academic papers on transformative AI is a nightmare” by Jakub Growiec

“The Unreasonable Effectiveness of Fiction” by Raelifin

“Legible vs. Illegible AI Safety Problems” by Wei Dai

“Lack of Social Grace is a Lack of Skill” by Screwtape

[Linkpost] “I ate bear fat with honey and salt flakes, to prove a point” by aggliu

“What’s up with Anthropic predicting AGI by early 2027?” by ryan_greenblatt

[Linkpost] “Emergent Introspective Awareness in Large Language Models” by Drake Thomas

[Linkpost] “You’re always stressed, your mind is always busy, you never have enough time” by mingyuan

“LLM-generated text is not testimony” by TsviBT