Specificity-aware reinforcement learning for fine-grained open-world classification

Samuele Angheben; Davide Berasi; Alessandro Conti; Elisa Ricci; Yiming Wang

Specificity-aware reinforcement learning for fine-grained open-world classification

Samuele Angheben, Davide Berasi, Alessandro Conti, Elisa Ricci, Yiming Wang

TL;DR

This work proposes a novel specificity-aware reinforcement learning framework, SpeciaRL, to fine-tune reasoning LMMs on fine-grained image classification under the open-world setting, and introduces a dynamic, verifier-based reward signal anchored to the best predictions within online rollouts, promoting specificity while respecting the model's capabilities to prevent incorrect predictions.

Abstract

Classifying fine-grained visual concepts under open-world settings, i.e., without a predefined label set, demands models to be both accurate and specific. Recent reasoning Large Multimodal Models (LMMs) exhibit strong visual understanding capability but tend to produce overly generic predictions when performing fine-grained image classification. Our preliminary analysis reveals that models do possess the intrinsic fine-grained domain knowledge. However, promoting more specific predictions (specificity) without compromising correct ones (correctness) remains a non-trivial and understudied challenge. In this work, we investigate how to steer reasoning LMMs toward predictions that are both correct and specific. We propose a novel specificity-aware reinforcement learning framework, SpeciaRL, to fine-tune reasoning LMMs on fine-grained image classification under the open-world setting. SpeciaRL introduces a dynamic, verifier-based reward signal anchored to the best predictions within online rollouts, promoting specificity while respecting the model's capabilities to prevent incorrect predictions. Our out-of-domain experiments show that SpeciaRL delivers the best trade-off between correctness and specificity across extensive fine-grained benchmarks, surpassing existing methods and advancing open-world fine-grained image classification. Code and model are publicly available at https://github.com/s-angheben/SpeciaRL.

Specificity-aware reinforcement learning for fine-grained open-world classification

TL;DR

Abstract

Paper Structure (25 sections, 11 equations, 18 figures, 15 tables)

This paper contains 25 sections, 11 equations, 18 figures, 15 tables.

Introduction
Related Works
Method
Problem formulation
Prediction Evaluation
On LMMs being overly generic
Specificity-aware Reinforcement Learning
Experiments
Main comparison
Ablation studies
Conclusion
Acknowledgements.
Additional implementation details
Prompts
LMM prompts
...and 10 more sections

Figures (18)

Figure 1: In open-world image classification, improving prediction specificity without compromising correctness remains challenging. Existing techniques, such as prompting to be specific, supervised fine-tuning (sft) or reinforcement fine-tuning (rft), promote specificity but reduce correctness. Instead, our proposed method (SpeciaRL) significantly improves the specificity of the base Qwen2.5VL-7B model without compromising correctness. Gray arrows indicate that training is performed on a single-domain (bird) dataset, which is disjoint from the domains in the test set, therefore illustrating cross-domain generalization.
Figure 2: Predictions distribution over categories for Qwen2.5VL-7B bai2025qwen2.5 and its BoN version with $N=64$ inference runs. The right side shows specificity, correctness and their harmonic mean (HM). The BoN-64 serves as an indication for the model’s potential capability.
Figure 3: Overview of SpeciaRL Given an input image $I$, the policy model generates $N$ open-ended predictions $\{p_1, \dots, p_N\}$. Each prediction is categorized by a judge model (LLM verifier) as wrong or correct at different levels of specificity with respect to the ground-truth. A verifiable reward $r_i^*$ is then assigned according to whether the prediction's category $c_i$ meets the adaptive reference level $c^*$, which is defined based on the best prediction within the $N$ rollouts. The resulting graded rewards are aggregated through a Group Relative Policy Optimization (GRPO) update to reinforce policies that are maximally specific while remaining correct.
Figure 4: Qualitative examples of the think-answer output from the base model Qwen2.5VL-7B and our SpeciaRL, which steers the reasoning traces towards more specific prediction.
Figure 5: LMM default prompt for prediction.
...and 13 more figures

Specificity-aware reinforcement learning for fine-grained open-world classification

TL;DR

Abstract

Specificity-aware reinforcement learning for fine-grained open-world classification

Authors

TL;DR

Abstract

Table of Contents

Figures (18)