Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

Peijun Qing; Puneet Mathur; Nedim Lipka; Varun Manjunatha; Ryan Rossi; Franck Dernoncourt; Saeed Hassanpour; Soroush Vosoughi

Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

Peijun Qing, Puneet Mathur, Nedim Lipka, Varun Manjunatha, Ryan Rossi, Franck Dernoncourt, Saeed Hassanpour, Soroush Vosoughi

Abstract

General-purpose embedding models excel at recognizing semantic similarities but fail to capture the characteristics of texts specified by user instructions. In contrast, instruction-tuned embedders can align embeddings with textual instructions yet cannot autonomously infer latent corpus structures, such as determining the optimal number of clusters. To address both limitations, we reframe instruction-following clustering as a generative task and train large reasoning models (LRMs) as autonomous clustering agents. Our reasoning-driven training pipeline enables LRMs to interpret high-level clustering instructions and then infer the corresponding latent groupings. To evaluate this paradigm, we introduce ReasonCluster, a comprehensive benchmark comprising 28 diverse tasks spanning daily dialogue, legal cases, and financial reports. Experiments across diverse datasets and clustering scenarios show that our approach consistently outperforms strong embedding-based methods and LRM baselines, demonstrating that explicit reasoning fosters more faithful and interpretable instruction-based clustering.

Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

Abstract

Paper Structure (59 sections, 12 equations, 18 figures, 5 tables)

This paper contains 59 sections, 12 equations, 18 figures, 5 tables.

Introduction
Related Work
Problem Definition
Benchmark Construction
Multi-Agent Data Synthesis
Cluster Generation.
Cluster Refinement.
Consensus-Based Multi-Agent Labeling.
Datasets
Benchmark Details
Cluster-R1
Reasoning Distillation for Instruction-Following Clustering
Reinforcement Learning with Multiplicative Hybrid Rewards
Format reward.
Clustering reward.
...and 44 more sections

Figures (18)

Figure 1: Overview of embedding-based vs. reasoning-driven clustering. LRMs follow diverse user instructions and adaptively infer latent group structures.
Figure 2: The system prompt for clustering.
Figure 3: The overview of our multi-agent data synthesis pipeline.
Figure 4: Overview of the benchmark evaluation splits: (a) dataset statistics by source and split, where text and input lengths are measured in tokens, (b) distribution of the number of clusters per data example, and (c) distribution of cluster size, indicating the number of text instances per cluster for each dataset.
Figure 5: Training dynamics of different training recipes for the Qwen-7B model. Models initialized with distilled reasoning traces converge faster, exhibit more stable response lengths, and achieve higher rewards.
...and 13 more figures

Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

Abstract

Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

Authors

Abstract

Table of Contents

Figures (18)