
State of LLMs 2026: RLVR, GRPO, Inference Scaling — Sebastian Raschka
Sebastian Raschka joins the MAD Podcast for a deep, educational tour of what actually changed in LLMs in 2025 — and what matters heading into 2026.
We start with the big architecture question: are transformers still the winning design, and what should we make of world models, small “recursive” reasoning models and text diffusion approaches? Then we get into the real story of the last 12 months: post-training and reasoning. Sebastian breaks down RLVR (reinforcement learning with verifiable rewards) and GRPO, why they pair so well, what makes them cheaper to scale than classic RLHF, and how they “unlock” reasoning already latent in base models.
We also cover why “benchmaxxing” is warping evaluation, why Sebastian increasingly trusts real usage over benchmark scores, and why inference-time scaling and tool use may be the underappreciated drivers of progress. Finally, we zoom out: where moats live now (hint: private data), why more large companies may train models in-house, and why continual learning is still so hard.
If you want the 2025–2026 LLM landscape explained like a masterclass — this is it.
Sources:
The State Of LLMs 2025: Progress, Problems, and Predictions - https://x.com/rasbt/status/2006015301717028989?s=20
The Big LLM Architecture Comparison - https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison
Sebastian Raschka
Website - https://sebastianraschka.com
Blog - https://magazine.sebastianraschka.com
LinkedIn - https://www.linkedin.com/in/sebastianraschka/
X/Twitter - https://x.com/rasbt
FIRSTMARK
Website - https://firstmark.com
X/Twitter - https://twitter.com/FirstMarkCap
Matt Turck (Managing Director)
Blog - https://mattturck.com
LinkedIn - https://www.linkedin.com/in/turck/
X/Twitter - https://twitter.com/mattturck
(00:00) - Intro
(01:05) - Are the days of Transformers numbered?
(14:05) - World models: what they are and why people care
(06:01) - Small “recursive” reasoning models (ARC, iterative refinement)
(09:45) - What is a diffusion model (for text)?
(13:24) - Are we seeing real architecture breakthroughs — or just polishing?
(14:04) - MoE + “efficiency tweaks” that actually move the needle
(17:26) - “Pre-training isn’t dead… it’s just boring”
(18:03) - 2025’s headline shift: RLVR + GRPO (post-training for reasoning)
(20:58) - Why RLHF is expensive (reward model + value model)
(21:43) - Why GRPO makes RLVR cheaper and more scalable
(24:54) - Process Reward Models (PRMs): why grading the steps is hard
(28:20) - Can RLVR expand beyond math & coding?
(30:27) - Why RL feels “finicky” at scale
(32:34) - The practical “tips & tricks” that make GRPO more stable
(35:29) - The meta-lesson of 2025: progress = lots of small improvements
(38:41) - “Benchmaxxing”: why benchmarks are getting less trustworthy
(43:10) - The other big lever: inference-time scaling
(47:36) - Tool use: reducing hallucinations by calling external tools
(49:57) - The “private data edge” + in-house model training
(55:14) - Continual learning: why it’s hard (and why it’s not 2026)
(59:28) - How Sebastian works: reading, coding, learning “from scratch”
(01:04:55) - LLM burnout + how he uses models (without replacing himself)
D'autres épisodes de "The MAD Podcast with Matt Turck"



Ne ratez aucun épisode de “The MAD Podcast with Matt Turck” et abonnez-vous gratuitement à ce podcast dans l'application GetPodcast.







