Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

13/06/2025

Daily Paper Cast

0:00

20:39

🤗 Upvotes: 76 | cs.CL, cs.LG

Authors:
Pengyi Li, Matvey Skripkin, Alexander Zubrey, Andrey Kuznetsov, Ivan Oseledets

Title:
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Arxiv:
http://arxiv.org/abs/2506.06395v3

Abstract:
Large language models (LLMs) excel at reasoning, yet post-training remains critical for aligning their behavior with task goals. Existing reinforcement learning (RL) methods often depend on costly human annotations or external reward models. We propose Reinforcement Learning via Self-Confidence (RLSC), which uses the model's own confidence as reward signals-eliminating the need for labels, preference models, or reward engineering. Applied to Qwen2.5-Math-7B with only 16 samples per question and 10 or 20 training steps, RLSC improves accuracy by +13.4% on AIME2024, +21.2% on MATH500, +21.7% on Minerva Math, +20.8% on Olympiadbench, and +9.7% on AMC23. RLSC provides a simple, scalable post-training method for inference models, requiring only a small number of samples and unlabelled supervision.

D'autres épisodes de "Daily Paper Cast"

Plus d'épisodes

Découvrez le meilleur des podcasts sur l'application GetPodcast.

Abonnez-vous à tous vos podcasts préférés, écoutez les épisodes sans connexion internet et recevez des recommandations de podcasts passionnants.

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Daily Paper Cast

D'autres épisodes de "Daily Paper Cast"

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback

Effective Red-Teaming of Policy-Adherent Agents

Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks

Text-Aware Image Restoration with Diffusion Models

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

Discrete Audio Tokens: More Than a Survey!

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models