Voice AI’s Big Moment: Why Everything Is Changing Now (ft. Neil Zeghidour, Gradium AI)

19/2/2026

The MAD Podcast with Matt Turck

0:00

1:22:49

Voice used to be AI’s forgotten modality — awkward, slow, and fragile. Now it’s everywhere. In this reference episode on all things Voice AI, Matt Turck sits down with Neil Zeghidour, a top AI researcher and CEO of Gradium AI (ex-DeepMind/Google, Meta, Kyutai), to cover voice agents, speech-to-speech models, full-duplex conversation, on-device voice, and voice cloning.

We unpack what actually changed under the hood — why voice is finally starting to feel natural, and why it may become the default interface for a new generation of AI assistants and devices.

Neil breaks down today’s dominant “cascaded” voice stack — speech recognition into a text model, then text-to-speech back out — and why it’s popular: it’s modular and easy to customize. But he argues it has two key downsides: chaining models adds latency, and forcing everything through text strips out paralinguistic signals like tone, stress, and emotion. The next wave, he suggests, is combining cascade-like flexibility with the more natural feel of speech-to-speech and full-duplex conversation.

We go deep on full-duplex interaction (ending awkward turn-taking), the hardest unsolved problems (noisy real-world environments and multi-speaker chaos), and the realities of deploying voice at scale — including why models must be compact and when on-device voice is the right approach.

Finally, we tackle voice cloning: where it’s genuinely useful, what it means for deepfakes and privacy, and why watermarking isn’t a silver bullet.

If you care about voice agents, real-time AI, and the next generation of human-computer interaction, this is the episode to bookmark.

Neil Zeghidour

LinkedIn - https://www.linkedin.com/in/neil-zeghidour-a838aaa7/

X/Twitter - https://x.com/neilzegh

Gradium

Website - https://gradium.ai

X/Twitter - https://x.com/GradiumAI

Matt Turck (Managing Director)

Blog - https://mattturck.com

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

FirstMark

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

(00:00) Intro

(01:21) Voice AI’s big moment — and why we’re still early

(03:34) Why voice lagged behind text/image/video

(06:06) The convergence era: transformers for every modality

(07:40) Beyond Her: always-on assistants, wake words, voice-first devices

(11:01) Voice vs text: where voice fits (even for coding)

(12:56) Neil’s origin story: from finance to machine learning

(18:35) Neural codecs (SoundStream): compression as the unlock

(22:30) Kyutai: open research, small elite teams, moving fast

(31:32) Why big labs haven’t “won” voice AI4

(34:01) On-device voice: where it works, why compact models matter

(46:37) The last mile: real-world robustness, pronunciation, uptime

(41:35) Benchmarking voice: why metrics fail, how they actually test

(47:03) Cascades vs speech-to-speech: trade-offs + what’s next

(54:05) Hardest frontier: noisy rooms, factories, multi-speaker chaos

(1:00:50) New languages + dialects: what transfers, what doesn’t

(1:02:54 Hardware & compute: why voice isn’t a 10,000-GPU game

(1:07:27) What data do you need to train voice models?

(1:09:02) Deepfakes + privacy: why watermarking isn’t a solution

(1:12:30) Voice + vision: multimodality, screen awareness, video+audio

(1:14:43) Voice cloning vs voice design: where the market goes

(1:16:32) Paris/Europe AI: talent density, underdog energy, what’s next

Altri episodi di "The MAD Podcast with Matt Turck"

Altri episodi

Accedi a tutto il mondo dei podcast con l’app gratuita GetPodcast.

Iscriviti ai tuoi podcast preferiti, ascolta gli episodi offline e ricevi fantastici consigli.

Un'azienda di

Voice AI’s Big Moment: Why Everything Is Changing Now (ft. Neil Zeghidour, Gradium AI)

The MAD Podcast with Matt Turck

Altri episodi di "The MAD Podcast with Matt Turck"

Anthropic’s Felix Rieseberg: Claude Cowork, Mythos, and the SaaS Extinction

AI is Already Building AI | Google DeepMind’s Mostafa Dehghani

Benedict Evans: OpenAI’s Moat Problem & the Future of Software

Everything Gets Rebuilt: The New AI Agent Stack | Harrison Chase, LangChain

AI That Can Prove It’s Right: Verification as the Missing Layer in AI — Carina Hong

Voice AI’s Big Moment: Why Everything Is Changing Now (ft. Neil Zeghidour, Gradium AI)

Mistral AI vs. Silicon Valley: The Rise of Sovereign AI

Dylan Patel: NVIDIA's New Moat & Why China is "Semiconductor Pilled”

State of LLMs 2026: RLVR, GRPO, Inference Scaling — Sebastian Raschka

The End of GPU Scaling? Compute & The Agent Era — Tim Dettmers (Ai2) & Dan Fu (Together AI)