RealBirdID: Benchmarking Bird Species Identification in the Era of MLLMs

Logan Lawrence; Mustafa Chasmai; Rangel Daroya; Wuao Liu; Seoyun Jeong; Aaron Sun; Max Hamilton; Fabien Delattre; Oindrila Saha; Subhransu Maji; Grant Van Horn

RealBirdID: Benchmarking Bird Species Identification in the Era of MLLMs

Logan Lawrence, Mustafa Chasmai, Rangel Daroya, Wuao Liu, Seoyun Jeong, Aaron Sun, Max Hamilton, Fabien Delattre, Oindrila Saha, Subhransu Maji, Grant Van Horn

Abstract

Fine-grained bird species identification in the wild is frequently unanswerable from a single image: key cues may be non-visual (e.g. vocalization), or obscured due to occlusion, camera angle, or low resolution. Yet today's multimodal systems are typically judged on answerable, in-schema cases, encouraging confident guesses rather than principled abstention. We propose the RealBirdID benchmark: given an image of a bird, a system should either answer with a species or abstain with a concrete, evidence-based rationale: "requires vocalization," "low quality image," or "view obstructed". For each genus, the dataset includes a validation split composed of curated unanswerable examples with labeled rationales, paired with a companion set of clearly answerable instances. We find that (1) the species identification on the answerable set is challenging for a variety of open-source and proprietary models (less than 13% accuracy for MLLMs including GPT-5 and Gemini-2.5 Pro), (2) models with greater classification ability are not necessarily more calibrated to abstain from unanswerable examples, and (3) that MLLMs generally fail at providing correct reasons even when they do abstain. RealBirdID establishes a focused target for abstention-aware fine-grained recognition and a recipe for measuring progress.

RealBirdID: Benchmarking Bird Species Identification in the Era of MLLMs

Abstract

RealBirdID: Benchmarking Bird Species Identification in the Era of MLLMs

Abstract

Paper Structure

Table of Contents

Figures (25)