DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced Learning

Won-Seok Choi; Hyundo Lee; Dong-Sig Han; Junseok Park; Heeyeon Koo; Byoung-Tak Zhang

DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced Learning

Won-Seok Choi, Hyundo Lee, Dong-Sig Han, Junseok Park, Heeyeon Koo, Byoung-Tak Zhang

TL;DR

The paper tackles poor generalization of self-supervised learning under long-tailed class distributions by introducing DUEL, an active memory framework. It combines memory-inspired Hebbian Metric Learning with a distinctiveness objective to selectively replace duplicated items, thereby enriching memory diversity without relying on per-sample labels. Theoretical results connect memory-augmented objectives to the canonical HML loss and provide a practical, GPU-friendly DUEL policy that enhances downstream robustness across CIFAR-10, STL-10, and ImageNet-LT while preserving intra-class structure. Empirically, DUEL improves entropy of memory class distributions and promotes better inter-class separation, demonstrating practical impact for SSL in real-world imbalanced settings.

Abstract

Recent machine learning algorithms have been developed using well-curated datasets, which often require substantial cost and resources. On the other hand, the direct use of raw data often leads to overfitting towards frequently occurring class information. To address class imbalances cost-efficiently, we propose an active data filtering process during self-supervised pre-training in our novel framework, Duplicate Elimination (DUEL). This framework integrates an active memory inspired by human working memory and introduces distinctiveness information, which measures the diversity of the data in the memory, to optimize both the feature extractor and the memory. The DUEL policy, which replaces the most duplicated data with new samples, aims to enhance the distinctiveness information in the memory and thereby mitigate class imbalances. We validate the effectiveness of the DUEL framework in class-imbalanced environments, demonstrating its robustness and providing reliable results in downstream tasks. We also analyze the role of the DUEL policy in the training process through various metrics and visualizations.

DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced Learning

TL;DR

Abstract

Paper Structure (40 sections, 9 theorems, 55 equations, 10 figures, 5 tables, 4 algorithms)

This paper contains 40 sections, 9 theorems, 55 equations, 10 figures, 5 tables, 4 algorithms.

Introduction
Revisiting Metric Learning from a Hebbian-based Perspective
Problem Definition
Hebbian Metric Learning
Memory-integrated HML for Class-imbalanced Environment
Proof Sketch.
Duplicate Elimination on Active Memory with Hebbian Metric Learning
Memory Management Policy
Duplicate Elimination Policy on Active Memory
DUEL Framework
Experiments
Experiment setting
Class-imbalanced environment
Class-imbalanced learning with SSL frameworks
Analysis of the robustness of representation
...and 25 more sections

Key Result

Proposition 1

Minimizing $D_{\text{KL}}(p(x,c)||q(x,c;f))$ is equivalent to minimizing $\mathcal{L}_{\text{HML}}(f;\mathcal{D})$, which can be derived as: where $\mathcal{I}_h(f;\mathcal{D})$ and $\mathcal{I}_d(f;\mathcal{D})$ are denoted as Hebbian information and Distinctiveness information respectively.

Figures (10)

Figure 1: Visualizations of the concepts of working memory and our proposed DUEL framework. (A) Real-world agent perceives data from the environment and maps the representation to solve the task. Working memory finds semantically duplicated signals and reduces them to maximizes the total amount of information. (B) Inspired by this cognitive process, we design the Duplicate Elimination (DUEL) framework. With mutual duplication probability, the representations form a graph structure (center) and are filtered out (right) to gradually maximize the distinctiveness information.
Figure 2: Conceptual Visualization of Hebbian Metric Learning. HML minimizes the Hebbian information while maximizing the distinctiveness information.
Figure 3: Visualization of general DUEL framework. Our method stores various data for the negative samples by Duplicate Elimination. The DUEL policy selects the most duplicated sample in memory (green) and replaces it with current data (purple).
Figure 4: Visualization of the performance enhancement in the linear probing task. In both D-MoCo and D-SimCLR, accuracies are gradually improved during the training steps. Especially in D-MoCo, the DUEL process can prevent the dramatical performance degradation with high $\rho_{\max}$.
Figure 5: t-SNE visualization of the active data filtering process with DUEL policy. (a) The representations extracted by the trained model along with their corresponding class. (b) The agent faces a dominant class (pink) that occurs more frequently than others. (c) The DUEL policy $\pi_{\text{DUEL}}$ replaces duplicated data with newer data and maximizes the distinctiveness information.
...and 5 more figures

Theorems & Definitions (19)

Definition 1: Mutual duplication probability
Proposition 1: Hebbian Metric Learning
Proposition 2: HML Bound
Theorem 1: Optimality of M-HML
Definition 2: Duplicate Elimination
Definition 3: Message passing
Lemma 1: Joint distribution with density function
proof
Proposition 1: Hebbian Metric Learning
proof
...and 9 more

DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced Learning

TL;DR

Abstract

DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (19)