Test-Time Scaling with Reflective Generative Model

15/07/2025

Daily Paper Cast

0:00

21:33

🤗 Upvotes: 68 | cs.LG, cs.CL

Authors:
Zixiao Wang, Yuxin Wang, Xiaorui Wang, Mengting Xing, Jie Gao, Jianjun Xu, Guangcan Liu, Chenhui Jin, Zhuo Wang, Shengzhuo Zhang, Hongtao Xie

Title:
Test-Time Scaling with Reflective Generative Model

Arxiv:
http://arxiv.org/abs/2507.01951v2

Abstract:
We introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3-mini's performance via the new Reflective Generative Form. The new form focuses on high-quality reasoning trajectory selection and contains two novelties: 1) A unified interface for policy and process reward model: we share the backbone network and use task-specific heads for reasoning trajectory predicting and scoring respectively, introducing only 53M extra parameters for trajectory scoring. 2) Eliminating the reliance on process-level annotation: we provide a self-supervised process reward model, which can directly learn the high-quality reasoning trajectory selection from the outcome reward. Equipped with the reflective generative form, MetaStone-S1 is naturally suitable for test-time scaling, and we provide three reasoning effort modes (low, medium, and high) based on the controllable thinking length. Experiments demonstrate that our MetaStone-S1 achieves comparable performance to OpenAI o3-mini's series with only 32B parameter size. To support the research community, we have open-sourced MetaStone-S1 at https://github.com/MetaStone-AI/MetaStone-S1.

D'autres épisodes de "Daily Paper Cast"

Plus d'épisodes

Découvrez le meilleur des podcasts sur l'application GetPodcast.

Abonnez-vous à tous vos podcasts préférés, écoutez les épisodes sans connexion internet et recevez des recommandations de podcasts passionnants.

Test-Time Scaling with Reflective Generative Model

Daily Paper Cast

D'autres épisodes de "Daily Paper Cast"

Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

Test-Time Scaling with Reflective Generative Model

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

NeuralOS: Towards Simulating Operating Systems via Neural Generative Models