Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation

Hao Li; Jiayang Gu; Jingkuan Song; An Zhang; Lianli Gao

Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation

Hao Li, Jiayang Gu, Jingkuan Song, An Zhang, Lianli Gao

TL;DR

This work tackles the challenge of noisy labels in large-scale pre-training by introducing One-Step Anti-noise (OSA), a model-agnostic, estimator-driven paradigm that uses a high-dimensional cosine boundary in cone space to separate clean from noisy samples. The authors establish a boundary principle where the observed intersection boundary corresponds to a shifted orthogonal boundary, supported by theoretical analysis of propagation and contrastive learning. OSA employs a zero-shot estimator (e.g., CLIP) to score sample cleanliness in one forward pass, debiases the boundary with spatial debiasing, and reweights losses to mitigate noise with minimal computational overhead. Across multimodal and single-modality tasks, OSA achieves state-of-the-art robustness and transferability while maintaining efficiency, demonstrating strong practical potential for large-scale, noise-prone pre-training scenarios. The approach is architecture- and estimator-agnostic, scalable, and comes with code for broader adoption and experimentation in diverse domains.

Abstract

Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-training tasks is often impractical. Nonetheless, existing noise mitigation methods often encounter limitations in practical applications due to their task-specific design, model dependency, and significant computational overhead. In this work, we exploit the properties of high-dimensional orthogonality to identify a robust and effective boundary in cone space for separating clean and noisy samples. Building on this, we propose One-Step Anti-noise (OSA), a model-agnostic noisy label mitigation paradigm that employs an estimator model and a scoring function to assess the noise level of input pairs through just one-step inference. We empirically validate the superiority of OSA, demonstrating its enhanced training robustness, improved task transferability, streamlined deployment, and reduced computational overhead across diverse benchmarks, models, and tasks. Our code is released at https://github.com/leolee99/OSA.

Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation

TL;DR

Abstract

Paper Structure (54 sections, 3 theorems, 34 equations, 4 figures, 12 tables)

This paper contains 54 sections, 3 theorems, 34 equations, 4 figures, 12 tables.

Introduction
Boundary Principle Analysis
Hypothesis: Intersection Boundary is Shifted from Orthogonal Boundary
The orthogonal boundary clearly partitions the positive and negative regions.
Cone effect may induce orthogonal boundary shift.
Theoretical Verification of Intersection Origin
Relative relationship unchanged in propagation process.
Contrastive learning empowers the separation of clean and noisy samples.
Discussion on Applicability
How about the boundary robustness even in unfamiliar domains?
How to handle the overlaps through imbalanced probability?
Method
Task Definition
One-step Anti-Noise
Estimator Model
...and 39 more sections

Key Result

Theorem 1

Let $\mathbbm{R}^{d_{in}}$ be the original space before being transmitted in a neural network. Suppose $u,v \in \mathbbm{R}^{d_{in}}$ are any two random vectors with $\cos(u,v) \approx 0$. $u_c,v_c \in \mathbbm{R}^{d_{in}}$ is a pair of clean vectors with $\cos(u_c,v_c) > 0$, while $u_n,v_n \in \mat

Figures (4)

Figure 1: (a) The current anti-noise paradigm with multiple backward significantly enhances the training overhead. (b) CLIP unifies the framework of image-text matching and image classification through a shared space. (c-f) Cosine similarity distribution of noise and clean data with 50% noise.
Figure 2: OSA workflow. In the scoring phase, the estimator model calculates the semantic similarity score for input data pairs. Then the score is calibrated by debiasing the effect from the shifted orthogonal boundary. By passing through a scoring function, the in-batch loss is reweighted by the weight $w$.
Figure 3: The illustrations of several distributions on CC120K. (a) The parameter distribution. (b-d) The distribution of image features for the 128th, 256th, and 512th dimensions. (e-g) The distribution of text features for the 128th, 256th, and 512th dimensions.
Figure 4: Examples of generated SDM dataset. The first row is in sketch style, while the second row is in cartoon style.

Theorems & Definitions (5)

Theorem 1: Proportional shift of boundary
Lemma 1
proof : Proof of Theorem. \ref{['thm:1']}
Theorem 2: Output features tends to Gaussian
proof : Proof of Theorem. \ref{['thm:2']}

Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation

TL;DR

Abstract

Debiased Orthogonal Boundary-Driven Efficient Noise Mitigation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (5)