AI Insights – EP.2: Unlocking Cost-Effective AI with Small Language Models

26/2/2026

Cisco Podcast Network

0:00

22:27

In the latest episode of the Cisco AI Insights Podcast, hosts Rafael Herrera and Sónia Marques welcome Cisco AI operations engineer James Tidd for a discussion on the world of small language models (SLMs) and the evolution of efficient AI inference. Together, they unravel the complexities behind “Fast Inference from Transformers via Speculative Decoding,” a groundbreaking paper from Google that explores how smaller draft models can speed up large language model predictions while maintaining accuracy. James shares his hands-on experience experimenting with the technique, leveraging knowledge distillation and speculative execution. The trio also discusses the potential of this approach to optimize AI, reduce power consumption and costs, and help businesses of all sizes get more out of existing hardware. A special thank you to Google’s AI team for developing this month's paper. If you are interested in reading the paper yourself, please visit this link: https://research.google/blog/looking-back-at-speculative-decoding/.

Altri episodi di "Cisco Podcast Network"

Altri episodi

Accedi a tutto il mondo dei podcast con l’app gratuita GetPodcast.

Iscriviti ai tuoi podcast preferiti, ascolta gli episodi offline e ricevi fantastici consigli.

Un'azienda di

AI Insights – EP.2: Unlocking Cost-Effective AI with Small Language Models

Cisco Podcast Network

Altri episodi di "Cisco Podcast Network"

Cisco Tech Stories - ep 32 - Open Source Pizza Boxes in the DC

404 Script Not Found: Field Marketing

AI Insights – EP.2: Unlocking Cost-Effective AI with Small Language Models

Cisco SE Talks: AI in Manufacturing with Sentinel

S7 E1: Talking sovereign critical infrastructure, AI, and the EMEA Moment with Gordon Thomson

SHIFT HAPPENS EP.26: Identity at the Edge of Change w/Jon Sanchez - Part 1

S6 E12: Talking the role of a tech analyst, trends and innovations, and why the network is critical, with Zeus Kerravala

S6 E11: Talking people, policy, and purpose, and how Cisco is empowering employees in the AI era, with Fran Katsoudas

S6 E10: Talking CX and how AI is revolutionizing customer engagement and operational efficiency, with Carlos Pereira

404 Script Not Found: Business School