Table of Contents
Fetching ...

Towards Privacy-Preserving Fine-Grained Visual Classification via Hierarchical Learning from Label Proportions

Jinyi Chang, Dongliang Chang, Lei Chen, Bingyao Yu, Zhanyu Ma

TL;DR

This work tackles privacy-preserving fine-grained visual classification (FGVC) by removing the need for instance-level labels and leveraging bag-level label proportions. It introduces Learning from Hierarchical Fine-Grained Label Proportions (LHFGLP), which blends Learning from Label Proportions with an Unrolled Hierarchical Fine-Grained Sparse Dictionary Learning module and a Hierarchical Proportion Loss to enable progressive, hierarchical feature refinement under LLP supervision. Key innovations include a learnable dictionary with sparse representations, category-aware masking across hierarchical levels, Sparsemax-driven masking, and a multi-level bag-proportion loss that guides refinement at coarse to fine granularity. Experiments on CUB, Aircraft, and Cars show that LHFGLP consistently surpasses existing LLP baselines and remains competitive with instance-level methods, demonstrating the practicality of privacy-preserving FGVC and offering a plug-and-play framework for integration into existing pipelines; code and datasets are slated for public release.

Abstract

In recent years, Fine-Grained Visual Classification (FGVC) has achieved impressive recognition accuracy, despite minimal inter-class variations. However, existing methods heavily rely on instance-level labels, making them impractical in privacy-sensitive scenarios such as medical image analysis. This paper aims to enable accurate fine-grained recognition without direct access to instance labels. To achieve this, we leverage the Learning from Label Proportions (LLP) paradigm, which requires only bag-level labels for efficient training. Unlike existing LLP-based methods, our framework explicitly exploits the hierarchical nature of fine-grained datasets, enabling progressive feature granularity refinement and improving classification accuracy. We propose Learning from Hierarchical Fine-Grained Label Proportions (LHFGLP), a framework that incorporates Unrolled Hierarchical Fine-Grained Sparse Dictionary Learning, transforming handcrafted iterative approximation into learnable network optimization. Additionally, our proposed Hierarchical Proportion Loss provides hierarchical supervision, further enhancing classification performance. Experiments on three widely-used fine-grained datasets, structured in a bag-based manner, demonstrate that our framework consistently outperforms existing LLP-based methods. We will release our code and datasets to foster further research in privacy-preserving fine-grained classification.

Towards Privacy-Preserving Fine-Grained Visual Classification via Hierarchical Learning from Label Proportions

TL;DR

This work tackles privacy-preserving fine-grained visual classification (FGVC) by removing the need for instance-level labels and leveraging bag-level label proportions. It introduces Learning from Hierarchical Fine-Grained Label Proportions (LHFGLP), which blends Learning from Label Proportions with an Unrolled Hierarchical Fine-Grained Sparse Dictionary Learning module and a Hierarchical Proportion Loss to enable progressive, hierarchical feature refinement under LLP supervision. Key innovations include a learnable dictionary with sparse representations, category-aware masking across hierarchical levels, Sparsemax-driven masking, and a multi-level bag-proportion loss that guides refinement at coarse to fine granularity. Experiments on CUB, Aircraft, and Cars show that LHFGLP consistently surpasses existing LLP baselines and remains competitive with instance-level methods, demonstrating the practicality of privacy-preserving FGVC and offering a plug-and-play framework for integration into existing pipelines; code and datasets are slated for public release.

Abstract

In recent years, Fine-Grained Visual Classification (FGVC) has achieved impressive recognition accuracy, despite minimal inter-class variations. However, existing methods heavily rely on instance-level labels, making them impractical in privacy-sensitive scenarios such as medical image analysis. This paper aims to enable accurate fine-grained recognition without direct access to instance labels. To achieve this, we leverage the Learning from Label Proportions (LLP) paradigm, which requires only bag-level labels for efficient training. Unlike existing LLP-based methods, our framework explicitly exploits the hierarchical nature of fine-grained datasets, enabling progressive feature granularity refinement and improving classification accuracy. We propose Learning from Hierarchical Fine-Grained Label Proportions (LHFGLP), a framework that incorporates Unrolled Hierarchical Fine-Grained Sparse Dictionary Learning, transforming handcrafted iterative approximation into learnable network optimization. Additionally, our proposed Hierarchical Proportion Loss provides hierarchical supervision, further enhancing classification performance. Experiments on three widely-used fine-grained datasets, structured in a bag-based manner, demonstrate that our framework consistently outperforms existing LLP-based methods. We will release our code and datasets to foster further research in privacy-preserving fine-grained classification.

Paper Structure

This paper contains 26 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Comparison of general image classification and Learning from Label Proportions (LLP). Individual images are represented as dots, colored by their ground-truth categories. Under the LLP paradigm, images are aggregated into bags with only bag-level label proportions, which are in grey to indicate unknown categories. The red dashed lines reflect the efforts required for image classification.
  • Figure 2: Overview of our proposed LHFGLP framework, which could integrate our Unrolled Hierarchical Fine-Grained Sparse Dictionary Learning with the fundamental FGVC pipeline in a plug-and-play manner.
  • Figure 3: Implementing details of the Unrolled Hierarchical Fine-Grained Sparse Dictionary Learning strategy with Hierarchical Category-aware Masking.
  • Figure 4: t-SNE visualization of three fine-grained image example bags (in row). Each one corresponds to a feature space obtained from Vanilla ResNet-50 (left), baseline LLP (middle), and our LHFGLP (right). Colored points refer to the feature representation of each image. The red dashed lines indicate the efforts of classifiers to make the fine-grained distinctions.
  • Figure 5: Activation visualization of three fine-grained images in the Aircraft dataset derived from Vanilla ResNet-50 (left), baseline LLP (middle), and our LHFGLP (right). The highlighted parts refer to the supporting visual regions where the model is focusing its attention on for fine-grained image classification.