How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

16.10.2025

The MAD Podcast with Matt Turck

0:00

1:16:04

What does it really mean when GPT-5 “thinks”? In this conversation, OpenAI’s VP of Research Jerry Tworek explains how modern reasoning models work in practice—why pretraining and reinforcement learning (RL/RLHF) are both essential, what that on-screen “thinking” actually does, and when extra test-time compute helps (or doesn’t). We trace the evolution from O1 (a tech demo good at puzzles) to O3 (the tool-use shift) to GPT-5 (Jerry calls it “03.1-ish”), and talk through verifiers, reward design, and the real trade-offs behind “auto” reasoning modes.

We also go inside OpenAI: how research is organized, why collaboration is unusually transparent, and how the company ships fast without losing rigor. Jerry shares the backstory on competitive-programming results like ICPC, what they signal (and what they don’t), and where agents and tool use are genuinely useful today. Finally, we zoom out: could pretraining + RL be the path to AGI?

This is the MAD Podcast —AI for the 99%. If you’re curious about how these systems actually work (without needing a PhD), this episode is your map to the current AI frontier.

OpenAI

Website - https://openai.com

X/Twitter - https://x.com/OpenAI

Jerry Tworek

LinkedIn - https://www.linkedin.com/in/jerry-tworek-b5b9aa56

X/Twitter - https://x.com/millionint

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

(00:00) Intro

(01:01) What Reasoning Actually Means in AI

(02:32) Chain of Thought: Models Thinking in Words

(05:25) How Models Decide Thinking Time

(07:24) Evolution from O1 to O3 to GPT-5

(11:00) Before OpenAI: Growing up in Poland, Dropping out of School, Trading

(20:32) Working on Robotics and Rubik's Cube Solving

(23:02) A Day in the Life: Talking to Researchers

(24:06) How Research Priorities Are Determined

(26:53) Collaboration vs IP Protection at OpenAI

(29:32) Shipping Fast While Doing Deep Research

(31:52) Using OpenAI's Own Tools Daily

(32:43) Pre-Training Plus RL: The Modern AI Stack

(35:10) Reinforcement Learning 101: Training Dogs

(40:17) The Evolution of Deep Reinforcement Learning

(42:09) When GPT-4 Seemed Underwhelming at First

(45:39) How RLHF Made GPT-4 Actually Useful

(48:02) Unsupervised vs Supervised Learning

(49:59) GRPO and How DeepSeek Accelerated US Research

(53:05) What It Takes to Scale Reinforcement Learning

(55:36) Agentic AI and Long-Horizon Thinking

(59:19) Alignment as an RL Problem

(1:01:11) Winning ICPC World Finals Without Specific Training

(1:05:53) Applying RL Beyond Math and Coding

(1:09:15) The Path from Here to AGI

(1:12:23) Pure RL vs Language Models

Więcej odcinków z kanału "The MAD Podcast with Matt Turck"

Więcej odcinków

Odkrywaj najlepsze podcasty dzięki bezpłatnej aplikacji GetPodcast.

Subskrybuj ulubione podcasty, słuchaj odcinków offline i sprawdzaj najlepsze polecane podcasty.

How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

The MAD Podcast with Matt Turck

Więcej odcinków z kanału "The MAD Podcast with Matt Turck"

The Evaluators Are Being Evaluated — Pavel Izmailov (Anthropic/NYU)

DeepMind Gemini 3 Lead: What Comes After "Infinite Data"

What’s Next for AI? OpenAI’s Łukasz Kaiser (Transformer Co-Author)

Open Source AI Strikes Back — Inside Ai2’s OLMo 3 ‘Thinking"

Intelligence Isn’t Enough: Why Energy & Compute Decide the AGI Race – Eiso Kant

State of AI 2025 with Nathan Benaich: Power Deals, Reasoning Breakthroughs, Real Revenue

Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic)

How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

Sonnet 4.5 & the AI Plateau Myth — Sholto Douglas (Anthropic)

Goodbye Excel? AI Agents for Self-Driving Finance – Pigment CEO