Parsing Objects at a Finer Granularity: A Survey
Yifan Zhao, Jia Li, Yonghong Tian
TL;DR
This survey reframes fine-grained visual parsing through the lens of part relationship learning, connecting semantic part segmentation and fine-grained recognition as a unified family of tasks. It introduces a new taxonomy and compiles representative benchmarks, contrasting non-deep and deep learning approaches, including pose-aided, multi-scale, and graph-based strategies. By analyzing universal challenges and proposing a relational learning framework, the paper highlights how object-part, intra-object, and cross-image relationships can guide robust local feature learning. The discussion points toward practical research directions such as dynamic relationships, few-shot and hierarchical structures, and 3D-aware methods with potential for broad impact in fine-grained visual understanding.
Abstract
Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.
