GREC: Generalized Referring Expression Comprehension
Shuting He, Henghui Ding, Chang Liu, Xudong Jiang
TL;DR
This work exposes the limitations of Classic Referring Expression Comprehension (REC) in handling no-target and multi-target expressions. It introduces Generalized Referring Expression Comprehension (GREC) and the gRefCOCO dataset, along with new evaluation metrics that account for multiple bounding boxes and no-target cases. Empirical results show existing REC methods underperform on GREC, and a threshold-based, dynamic box-selection strategy offers the most effective grounding performance. The contributions provide a more realistic, versatile grounding framework with practical implications for multi-object grounding and image retrieval tasks. The work also provides benchmark resources and baseline implementations to accelerate future research.
Abstract
The objective of Classic Referring Expression Comprehension (REC) is to produce a bounding box corresponding to the object mentioned in a given textual description. Commonly, existing datasets and techniques in classic REC are tailored for expressions that pertain to a single target, meaning a sole expression is linked to one specific object. Expressions that refer to multiple targets or involve no specific target have not been taken into account. This constraint hinders the practical applicability of REC. This study introduces a new benchmark termed as Generalized Referring Expression Comprehension (GREC). This benchmark extends the classic REC by permitting expressions to describe any number of target objects. To achieve this goal, we have built the first large-scale GREC dataset named gRefCOCO. This dataset encompasses a range of expressions: those referring to multiple targets, expressions with no specific target, and the single-target expressions. The design of GREC and gRefCOCO ensures smooth compatibility with classic REC. The proposed gRefCOCO dataset, a GREC method implementation code, and GREC evaluation code are available at https://github.com/henghuiding/gRefCOCO.
