Table of Contents
Fetching ...

SAU: A Dual-Branch Network to Enhance Long-Tailed Recognition via Generative Models

Guangxi Li, Yinsheng Song, Mingkai Zheng

TL;DR

This work tackles long-tailed image recognition by leveraging synthetic data generated from large generative models. It introduces SAU, a dual-branch network with a synthetic-unaware path for mixed real-synthetic data and a synthetic-aware path that learns disparities between real and synthetic samples through supervised contrastive learning, label correction, and a prototype-guided mechanism, augmented by MixUp and CutMix augmentations and a noise-dropping strategy. The authors demonstrate state-of-the-art Top-1 accuracy on CIFAR-10-LT and CIFAR-100-LT and strong performance on ImageNet-LT, validating the approach across varying imbalance factors and shot regimes. The work provides a practical, end-to-end pipeline for incorporating synthetic data into long-tailed recognition, including prompt-based synthetic data generation, quality filtering, and robust training objectives, with publicly available code.

Abstract

Long-tailed distributions in image recognition pose a considerable challenge due to the severe imbalance between a few dominant classes with numerous examples and many minority classes with few samples. Recently, the use of large generative models to create synthetic data for image classification has been realized, but utilizing synthetic data to address the challenge of long-tailed recognition remains relatively unexplored. In this work, we proposed the use of synthetic data as a complement to long-tailed datasets to eliminate the impact of data imbalance. To tackle this real-synthetic mixed dataset, we designed a two-branch model that contains Synthetic-Aware and Unaware branches (SAU). The core ideas are (1) a synthetic-unaware branch for classification that mixes real and synthetic data and treats all data equally without distinguishing between them. (2) A synthetic-aware branch for improving the robustness of the feature extractor by distinguishing between real and synthetic data and learning their discrepancies. Extensive experimental results demonstrate that our method can improve the accuracy of long-tailed image recognition. Notably, our approach achieves state-of-the-art Top-1 accuracy and significantly surpasses other methods on CIFAR-10-LT and CIFAR-100-LT datasets across various imbalance factors. Our code is available at https://github.com/lgX1123/gm4lt.

SAU: A Dual-Branch Network to Enhance Long-Tailed Recognition via Generative Models

TL;DR

This work tackles long-tailed image recognition by leveraging synthetic data generated from large generative models. It introduces SAU, a dual-branch network with a synthetic-unaware path for mixed real-synthetic data and a synthetic-aware path that learns disparities between real and synthetic samples through supervised contrastive learning, label correction, and a prototype-guided mechanism, augmented by MixUp and CutMix augmentations and a noise-dropping strategy. The authors demonstrate state-of-the-art Top-1 accuracy on CIFAR-10-LT and CIFAR-100-LT and strong performance on ImageNet-LT, validating the approach across varying imbalance factors and shot regimes. The work provides a practical, end-to-end pipeline for incorporating synthetic data into long-tailed recognition, including prompt-based synthetic data generation, quality filtering, and robust training objectives, with publicly available code.

Abstract

Long-tailed distributions in image recognition pose a considerable challenge due to the severe imbalance between a few dominant classes with numerous examples and many minority classes with few samples. Recently, the use of large generative models to create synthetic data for image classification has been realized, but utilizing synthetic data to address the challenge of long-tailed recognition remains relatively unexplored. In this work, we proposed the use of synthetic data as a complement to long-tailed datasets to eliminate the impact of data imbalance. To tackle this real-synthetic mixed dataset, we designed a two-branch model that contains Synthetic-Aware and Unaware branches (SAU). The core ideas are (1) a synthetic-unaware branch for classification that mixes real and synthetic data and treats all data equally without distinguishing between them. (2) A synthetic-aware branch for improving the robustness of the feature extractor by distinguishing between real and synthetic data and learning their discrepancies. Extensive experimental results demonstrate that our method can improve the accuracy of long-tailed image recognition. Notably, our approach achieves state-of-the-art Top-1 accuracy and significantly surpasses other methods on CIFAR-10-LT and CIFAR-100-LT datasets across various imbalance factors. Our code is available at https://github.com/lgX1123/gm4lt.
Paper Structure (31 sections, 14 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 31 sections, 14 equations, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: The overall framework of our proposed method. By leveraging LLM and T2I models, we generate synthetic data as a complement to a long-tailed distributed dataset to obtain a balanced dataset. Synthetic-unaware branch takes the same view $v_1$ from two samples to calculate the mixing loss. Synthetic-aware branch takes two different views $v_2$ and $v_3$ from one sample to calculate supervised contrastive loss.
  • Figure 2: An example of label correction process.