“Towards a Typology of Strange LLM Chains-of-Thought” by 1a3orn

11/10/2025

LessWrong (Curated & Popular)

0:00

17:34

Intro

LLMs being trained with RLVR (Reinforcement Learning from Verifiable Rewards) start off with a 'chain-of-thought' (CoT) in whatever language the LLM was originally trained on. But after a long period of training, the CoT sometimes starts to look very weird; to resemble no human language; or even to grow completely unintelligible.

Why might this happen?

I've seen a lot of speculation about why. But a lot of this speculation narrows too quickly, to just one or two hypotheses. My intent is also to speculate, but more broadly.

Specifically, I want to outline six nonexclusive possible causes for the weird tokens: new better language, spandrels, context refresh, deliberate obfuscation, natural drift, and conflicting shards.

And I also wish to extremely roughly outline ideas for experiments and evidence that could help us distinguish these causes.

I'm sure I'm not enumerating the full space of [...]

---

Outline:

(00:11) Intro

(01:34) 1. New Better Language

(04:06) 2. Spandrels

(06:42) 3. Context Refresh

(10:48) 4. Deliberate Obfuscation

(12:36) 5. Natural Drift

(13:42) 6. Conflicting Shards

(15:24) Conclusion

---

First published:
October 9th, 2025

Source:
https://www.lesswrong.com/posts/qgvSMwRrdqoDMJJnD/towards-a-typology-of-strange-llm-chains-of-thought

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Otros episodios de "LessWrong (Curated & Popular)"

Más episodios

Descubre los mejores podcast en la aplicación gratuita de GetPodcast.

Suscríbete a tus podcast favoritos, escucha episodios cuando estés sin conexión y déjate sorprender por las recomendaciones personalizadas.

“Towards a Typology of Strange LLM Chains-of-Thought” by 1a3orn

LessWrong (Curated & Popular)

Otros episodios de "LessWrong (Curated & Popular)"

“Don’t Mock Yourself” by Algon

“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd

“The Most Common Bad Argument In These Parts” by J Bostock

“Towards a Typology of Strange LLM Chains-of-Thought” by 1a3orn

“I take antidepressants. You’re welcome” by Elizabeth

“Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior” by Sam Marks

“Hospitalization: A Review” by Logan Riggs

“What, if not agency?” by abramdemski

“The Origami Men” by Tomás B.

“A non-review of ‘If Anyone Builds It, Everyone Dies’” by boazbarak