“‘The Solomonoff Prior is Malign’ is a special case of a simpler argument” by David Matolcsi

25.11.2024

LessWrong (Curated & Popular)

0:00

21:02

[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the concept and want to understand it better.]

Introduction

I recently reread the classic argument from Paul Christiano about the Solomonoff prior being malign, and Mark Xu's write-up on it. I believe that the part of the argument about the Solomonoff induction is not particularly load-bearing, and can be replaced by a more general argument that I think is easier to understand. So I will present the general argument first, and only explain in the last section how the Solomonoff prior can come into the picture.

I don't claim that anything I write here is particularly new, I think you can piece together this picture from various scattered comments on the topic, but I think it's good to have it written up in one place.

[...]

---

Outline:

(00:17) Introduction

(00:56) How an Oracle gets manipulated

(05:25) What went wrong?

(05:28) The AI had different probability estimates than the humans for anthropic reasons

(07:01) The AI was thinking in terms of probabilities and not expected values

(08:40) Probabilities are cursed in general, only expected values are real

(09:19) What about me?

(13:00) Should this change any of my actions?

(16:25) How does the Solomonoff prior come into the picture?

(20:10) Conclusion

The original text contained 14 footnotes which were omitted from this narration.

---

First published:
November 17th, 2024

Source:
https://www.lesswrong.com/posts/KSdqxrrEootGSpKKE/the-solomonoff-prior-is-malign-is-a-special-case-of-a

---

Narrated by TYPE III AUDIO.

Weitere Episoden von „LessWrong (Curated & Popular)“

Weitere Episoden

Hol dir die ganze Welt der Podcasts mit der kostenlosen GetPodcast App.

Abonniere alle deine Lieblingspodcasts, höre Episoden auch offline und erhalte passende Empfehlungen für Podcasts, die dich wirklich interessieren.

“‘The Solomonoff Prior is Malign’ is a special case of a simpler argument” by David Matolcsi

LessWrong (Curated & Popular)

Weitere Episoden von „LessWrong (Curated & Popular)“

“The Field of AI Alignment: A Postmortem, and What To Do About It” by johnswentworth

“When Is Insurance Worth It?” by kqr

“Orienting to 3 year AGI timelines” by Nikola Jurkovic

“What Goes Without Saying” by sarahconstantin

“o3” by Zach Stein-Perlman

“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit

“AIs Will Increasingly Attempt Shenanigans” by Zvi

“Alignment Faking in Large Language Models” by ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck

“Communications in Hard Mode (My new job at MIRI)” by tanagrabeast

“Biological risk from the mirror world” by jasoncrawford