Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

11/09/2025

Daily Paper Cast

0:00

23:21

🤗 Upvotes: 66 | cs.CL

Authors:
Tong Zheng, Hongming Zhang, Wenhao Yu, Xiaoyang Wang, Xinyu Yang, Runpeng Dai, Rui Liu, Huiwen Bao, Chengsong Huang, Heng Huang, Dong Yu

Title:
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Arxiv:
http://arxiv.org/abs/2509.07980v1

Abstract:
Parallel thinking has emerged as a novel approach for enhancing the reasoning capabilities of large language models (LLMs) by exploring multiple reasoning paths concurrently. However, activating such capabilities through training remains challenging, as existing methods predominantly rely on supervised fine-tuning (SFT) over synthetic data, which encourages teacher-forced imitation rather than exploration and generalization. Different from them, we propose \textbf{Parallel-R1}, the first reinforcement learning (RL) framework that enables parallel thinking behaviors for complex real-world reasoning tasks. Our framework employs a progressive curriculum that explicitly addresses the cold-start problem in training parallel thinking with RL. We first use SFT on prompt-generated trajectories from easier tasks to instill the parallel thinking ability, then transition to RL to explore and generalize this skill on harder problems. Experiments on various math benchmarks, including MATH, AMC23, and AIME, show that Parallel-R1 successfully instills parallel thinking, leading to 8.4% accuracy improvements over the sequential thinking model trained directly on challenging tasks with RL. Further analysis reveals a clear shift in the model's thinking behavior: at an early stage, it uses parallel thinking as an exploration strategy, while in a later stage, it uses the same capability for multi-perspective verification. Most significantly, we validate parallel thinking as a \textbf{mid-training exploration scaffold}, where this temporary exploratory phase unlocks a higher performance ceiling after RL, yielding a 42.9% improvement over the baseline on AIME25. Our model, data, and code will be open-source at https://github.com/zhengkid/Parallel-R1.

D'autres épisodes de "Daily Paper Cast"

Plus d'épisodes

Découvrez le meilleur des podcasts sur l'application GetPodcast.

Abonnez-vous à tous vos podcasts préférés, écoutez les épisodes sans connexion internet et recevez des recommandations de podcasts passionnants.

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Daily Paper Cast

D'autres épisodes de "Daily Paper Cast"

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Visual Representation Alignment for Multimodal Large Language Models

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Reconstruction Alignment Improves Unified Multimodal Models

UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Reverse-Engineered Reasoning for Open-Ended Generation

Does DINOv3 Set a New Medical Vision Standard?

Symbolic Graphics Programming with Large Language Models

Set Block Decoding is a Language Model Inference Accelerator

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth