Table of Contents
Fetching ...

Parsing Objects at a Finer Granularity: A Survey

Yifan Zhao, Jia Li, Yonghong Tian

TL;DR

This survey reframes fine-grained visual parsing through the lens of part relationship learning, connecting semantic part segmentation and fine-grained recognition as a unified family of tasks. It introduces a new taxonomy and compiles representative benchmarks, contrasting non-deep and deep learning approaches, including pose-aided, multi-scale, and graph-based strategies. By analyzing universal challenges and proposing a relational learning framework, the paper highlights how object-part, intra-object, and cross-image relationships can guide robust local feature learning. The discussion points toward practical research directions such as dynamic relationships, few-shot and hierarchical structures, and 3D-aware methods with potential for broad impact in fine-grained visual understanding.

Abstract

Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.

Parsing Objects at a Finer Granularity: A Survey

TL;DR

This survey reframes fine-grained visual parsing through the lens of part relationship learning, connecting semantic part segmentation and fine-grained recognition as a unified family of tasks. It introduces a new taxonomy and compiles representative benchmarks, contrasting non-deep and deep learning approaches, including pose-aided, multi-scale, and graph-based strategies. By analyzing universal challenges and proposing a relational learning framework, the paper highlights how object-part, intra-object, and cross-image relationships can guide robust local feature learning. The discussion points toward practical research directions such as dynamic relationships, few-shot and hierarchical structures, and 3D-aware methods with potential for broad impact in fine-grained visual understanding.

Abstract

Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.
Paper Structure (21 sections, 7 figures, 2 tables)

This paper contains 21 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Comparisons of coarse-grained learning and fine-grained learning. Representative fine-grained learning tasks, i.e., semantic part segmentation and fine-grained recognition, rely on the part relationship learning to build robust local features, while the coarse-grained tasks can be achieved by image-level global features.
  • Figure 2: The landscape of semantic part segmentation tasks in our taxonomy. We summarize the recent advances from two different aspects: problem setting and learning strategy.
  • Figure 3: Task settings of single-class and multi-class part segmentation. The single-class part segmentation only focuses on segmenting the objects of one specific class, while multi-class part segmentation aims to segment multiple objects that occurred in one scenario.
  • Figure 4: Three typical challenges in fine-grained recognition tasks (images from CUB dataset wah2011caltech). 1) Heterogeneous semantic spaces: the semantic definitions of fine-grained text labels are usually cluster distributed. 2) Near-duplicated inter-class appearances: objects of different categories present visually similar appearances. 3) Inter-class shape variances: the shape structures of objects in the same category can be inconsistent.
  • Figure 5: Three typical high-order relations as in zhao2021graph. Vanilla classification: encoded features are pooled into vectors for classification, used in most of the works. Second-order relationship lin2015bilineargao2016compactli2017factorizedwei2018grassmannzhao2021graph: learning rich second-order features by keeping the spatial dimension. Trilinear attention zheng2019lookinggao2020channelwang2018non: preserving the same size as input features for learning spatial-wise or channel-wise attention matrix.
  • ...and 2 more figures