Table of Contents
Fetching ...

Cross-Hierarchical Bidirectional Consistency Learning for Fine-Grained Visual Classification

Pengxiang Gao, Yihao Liang, Yanzhi Song, Zhouwang Yang

TL;DR

This work tackles fine-grained visual classification by exploiting inherent Tree Hierarchy without requiring extra annotations. It introduces CHBC, a framework that combines a trunk net with multiple MGEs (per hierarchy) and CAM-based attention, along with a Cross-hierarchical Bidirectional Consistency (CBC) module that enforces consistency across coarse-to-fine and fine-to-coarse predictions. The approach uses matrix orthogonal decomposition to separate level-specific information, and a Jensen-Shannon divergence-based loss to align distributions across all hierarchical levels, achieving improved wa_acc and TCR on three FGVC benchmarks. Overall, CHBC demonstrates that bidirectional hierarchical regularization plus multi-granularity feature enhancement yields more accurate and consistent fine-grained predictions, with practical impact for applications demanding multi-level label support without additional annotations.

Abstract

Fine-Grained Visual Classification (FGVC) aims to categorize closely related subclasses, a task complicated by minimal inter-class differences and significant intra-class variance. Existing methods often rely on additional annotations for image classification, overlooking the valuable information embedded in Tree Hierarchies that depict hierarchical label relationships. To leverage this knowledge to improve classification accuracy and consistency, we propose a novel Cross-Hierarchical Bidirectional Consistency Learning (CHBC) framework. The CHBC framework extracts discriminative features across various hierarchies using a specially designed module to decompose and enhance attention masks and features. We employ bidirectional consistency loss to regulate the classification outcomes across different hierarchies, ensuring label prediction consistency and reducing misclassification. Experiments on three widely used FGVC datasets validate the effectiveness of the CHBC framework. Ablation studies further investigate the application strategies of feature enhancement and consistency constraints, underscoring the significant contributions of the proposed modules.

Cross-Hierarchical Bidirectional Consistency Learning for Fine-Grained Visual Classification

TL;DR

This work tackles fine-grained visual classification by exploiting inherent Tree Hierarchy without requiring extra annotations. It introduces CHBC, a framework that combines a trunk net with multiple MGEs (per hierarchy) and CAM-based attention, along with a Cross-hierarchical Bidirectional Consistency (CBC) module that enforces consistency across coarse-to-fine and fine-to-coarse predictions. The approach uses matrix orthogonal decomposition to separate level-specific information, and a Jensen-Shannon divergence-based loss to align distributions across all hierarchical levels, achieving improved wa_acc and TCR on three FGVC benchmarks. Overall, CHBC demonstrates that bidirectional hierarchical regularization plus multi-granularity feature enhancement yields more accurate and consistent fine-grained predictions, with practical impact for applications demanding multi-level label support without additional annotations.

Abstract

Fine-Grained Visual Classification (FGVC) aims to categorize closely related subclasses, a task complicated by minimal inter-class differences and significant intra-class variance. Existing methods often rely on additional annotations for image classification, overlooking the valuable information embedded in Tree Hierarchies that depict hierarchical label relationships. To leverage this knowledge to improve classification accuracy and consistency, we propose a novel Cross-Hierarchical Bidirectional Consistency Learning (CHBC) framework. The CHBC framework extracts discriminative features across various hierarchies using a specially designed module to decompose and enhance attention masks and features. We employ bidirectional consistency loss to regulate the classification outcomes across different hierarchies, ensuring label prediction consistency and reducing misclassification. Experiments on three widely used FGVC datasets validate the effectiveness of the CHBC framework. Ablation studies further investigate the application strategies of feature enhancement and consistency constraints, underscoring the significant contributions of the proposed modules.

Paper Structure

This paper contains 14 sections, 22 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: The form of Tree Hierarchy, and we hide the root note. The inter-class difference and intra-class variance in leaf nodes might be limited to subtle aspects like the left. Various users could select the hierarchy they need. Green arrows denote the consistent classification and red arrows mean the inconsistent predictions.
  • Figure 2: An overview of CHBC. Trunk net extracts shared features. MGE modules in branch net refine and interact features and masks between hierarchies, the bottom left shows the inside of MGE. Classifiers predict hierarchical labels. The right part illustrates two loss functions in CHBC: the consistency loss calculated by CBC and the cross-entropy loss.
  • Figure 3: (a) demonstrates the details of feature enhancement in MGE. (b) shows how CBC unifies the distributions of different dimensions into the same dimension, where $sub^a_{[1,2,3]}$ and $sub^b_{[1,2]}$ are subclasses of $super_a$ and $super_b$, respectively.
  • Figure 4: Three interaction strategies of consistency loss. All-to-all implies each level interacts with all other levels, all-to-finest implies each level only interacts with the finest level, between-neighbor implies each level only interacts with the neighbor in Tree Hierarchy.
  • Figure 5: (a) is the visualization of CAM, and (b) is the visualization of probability distributions on CUB.
  • ...and 2 more figures