KV Cache Steering for Inducing Reasoning in Small Language Models

15/07/2025

Daily Paper Cast

0:00

22:47

🤗 Upvotes: 26 | cs.CL, cs.AI

Authors:
Max Belitsky, Dawid J. Kopiczko, Michael Dorkenwald, M. Jehanzeb Mirza, Cees G. M. Snoek, Yuki M. Asano

Title:
KV Cache Steering for Inducing Reasoning in Small Language Models

Arxiv:
http://arxiv.org/abs/2507.08799v1

Abstract:
We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach leverages GPT-4o-generated reasoning traces to construct steering vectors that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of hyperparameter stability, inference-time efficiency, and ease of integration, making it a more robust and practical solution for controlled generation.

D'autres épisodes de "Daily Paper Cast"

Plus d'épisodes

Découvrez le meilleur des podcasts sur l'application GetPodcast.

Abonnez-vous à tous vos podcasts préférés, écoutez les épisodes sans connexion internet et recevez des recommandations de podcasts passionnants.

KV Cache Steering for Inducing Reasoning in Small Language Models

Daily Paper Cast

D'autres épisodes de "Daily Paper Cast"

Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

Test-Time Scaling with Reflective Generative Model

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models