Bridging Generative and Discriminative Noisy-Label Learning via Direction-Agnostic EM Formulation

Fengbei Liu; Chong Wang; Yuanhong Chen; Yuyuan Liu; Gustavo Carneiro

Bridging Generative and Discriminative Noisy-Label Learning via Direction-Agnostic EM Formulation

Fengbei Liu, Chong Wang, Yuanhong Chen, Yuyuan Liu, Gustavo Carneiro

TL;DR

This paper tackles the challenge of noisy-label learning by proposing a direction-agnostic EM framework that bridges generative and discriminative approaches without training heavy image generators. It replaces intractable generative terms with a discriminative proxy for $p(\mathbf{x}|\mathbf{y})$ and introduces Partial-Label Supervision to form an instance-specific prior $p(\mathbf{y})$, balancing coverage and uncertainty. The method yields state-of-the-art accuracy and lower transition-matrix estimation error across vision and NLP benchmarks, while demanding substantially less compute than prior generative models. Overall, the approach offers a scalable, robust solution for noisy-label problems with practical impact across multimodal datasets.

Abstract

Although noisy-label learning is often approached with discriminative methods for simplicity and speed, generative modeling offers a principled alternative by capturing the joint mechanism that produces features, clean labels, and corrupted observations. However, prior work typically (i) introduces extra latent variables and heavy image generators that bias training toward reconstruction, (ii) fixes a single data-generating direction ($Y\rightarrow\!X$ or $X\rightarrow\!Y$), limiting adaptability, and (iii) assumes a uniform prior over clean labels, ignoring instance-level uncertainty. We propose a single-stage, EM-style framework for generative noisy-label learning that is \emph{direction-agnostic} and avoids explicit image synthesis. First, we derive a single Expectation-Maximization (EM) objective whose E-step specializes to either causal orientation without changing the overall optimization. Second, we replace the intractable $p(X\mid Y)$ with a dataset-normalized discriminative proxy computed using a discriminative classifier on the finite training set, retaining the structural benefits of generative modeling at much lower cost. Third, we introduce \emph{Partial-Label Supervision} (PLS), an instance-specific prior over clean labels that balances coverage and uncertainty, improving data-dependent regularization. Across standard vision and natural language processing (NLP) noisy-label benchmarks, our method achieves state-of-the-art accuracy, lower transition-matrix estimation error, and substantially less training compute than current generative and discriminative baselines. Code: https://github.com/lfb-1/GNL

Bridging Generative and Discriminative Noisy-Label Learning via Direction-Agnostic EM Formulation

TL;DR

and introduces Partial-Label Supervision to form an instance-specific prior

, balancing coverage and uncertainty. The method yields state-of-the-art accuracy and lower transition-matrix estimation error across vision and NLP benchmarks, while demanding substantially less compute than prior generative models. Overall, the approach offers a scalable, robust solution for noisy-label problems with practical impact across multimodal datasets.

Abstract

), limiting adaptability, and (iii) assumes a uniform prior over clean labels, ignoring instance-level uncertainty. We propose a single-stage, EM-style framework for generative noisy-label learning that is \emph{direction-agnostic} and avoids explicit image synthesis. First, we derive a single Expectation-Maximization (EM) objective whose E-step specializes to either causal orientation without changing the overall optimization. Second, we replace the intractable $p(X\mid Y)$ with a dataset-normalized discriminative proxy computed using a discriminative classifier on the finite training set, retaining the structural benefits of generative modeling at much lower cost. Third, we introduce \emph{Partial-Label Supervision} (PLS), an instance-specific prior over clean labels that balances coverage and uncertainty, improving data-dependent regularization. Across standard vision and natural language processing (NLP) noisy-label benchmarks, our method achieves state-of-the-art accuracy, lower transition-matrix estimation error, and substantially less training compute than current generative and discriminative baselines. Code: https://github.com/lfb-1/GNL

Paper Structure (25 sections, 3 theorems, 32 equations, 5 figures, 10 tables, 1 algorithm)

This paper contains 25 sections, 3 theorems, 32 equations, 5 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Discriminative noisy-label learning
Generative modeling in noisy-label learning
Weak supervision in noisy-label learning
Method
Formulation of the joint likelihood
Formulation for the $Y \rightarrow X$ relationship
Formulation for the $X \rightarrow Y$ relationship
Approximation of $p(\mathbf{x}|\mathbf{y})$ and $p(\mathbf{x})$
Approximation of $p(\mathbf{y})$ by partial label supervision
Optimization process
Why partial labels yield better pseudo labels
Experiments
Causal relationships in each dataset
...and 10 more sections

Key Result

Lemma 1

Figures (5)

Figure 1: Generative noisy-label learning models and their corresponding probability functions, where the red arrow indicates the causal direction between $X$ and $Y$. (a) CausalNL yao2021instance and InstanceGM garg2022instance assume that $Y \rightarrow X$ and optimize the joint likelihood $p(X,\tilde{Y})$, which requires an additional latent variable $Z$ for generation. (b) NPC bae2022noisy and DyGEN zhuang2023dygen assume that $X \rightarrow Y$ and optimize $p(Y|\tilde{Y},X)$ as a post-processing step; they do not require $Z$. (c) Our proposed method optimizes $p(X,\tilde{Y})$ and accommodates both causal directions across datasets, without the need to model $Z$.
Figure 2: Description of the proposed framework. (a) shows the input image $\mathbf{x}$ and the model components. The term $q(\mathbf{y}|\mathbf{x})$ is the variational posterior in Eq. \ref{['eq:optim_joint_p_x_tildey2']}, i.e., the clean-label classifier, and $p(\tilde{\mathbf{y}}| \mathbf{y}, \mathbf{x})$ is the noise-transition module in Eq. \ref{['eq:two_expectation']}. (b) presents the Expectation--Maximization (EM) objective introduced in Eq. \ref{['eq:optim_goal']} and Eq. \ref{['eq:pyx_optim_goal']} and the corresponding implementation losses. In the E-step, we minimize one of the $\mathsf{KL}$ divergences (Eq. \ref{['eq:KL_divergence_pxy']} or Eq. \ref{['eq:KL_divergence_pyx']}), depending on whether the data-generating process is causal or anti-causal. In the M-step, we maximize the ELBO and the associated expectations by minimizing the losses in Eq. \ref{['eq:ce']} and Eq. \ref{['eq:partial_loss']}. (c) illustrates the labels used for supervision. $\tilde{\mathbf{y}}$ denotes the original noisy label. $p(\mathbf{y})$ is the constructed clean-label prior. "PLS" denotes the proposed Partial-Label Supervision introduced in Sec. \ref{['sec:construct_partial']}. (d) shows the pipeline that constructs $p(\mathbf{y})$. "Coverage" and "Uncertainty" correspond to the selections defined in Eq. \ref{['eq:moving_average']} and Eq. \ref{['eq:min_ineff']}.
Figure 3: Examples of CIFAR-10 images with 40% IDN noise and PLS-constructed partial labels from Eq. \ref{['eq:true_label_prior']}. Here, $\mathbf{y}$ is the clean label; $\tilde{\mathbf{y}}$ is the noisy-label; $\mathbf{c}$ is the "Coverage" label from Eq. \ref{['eq:moving_average']}; $\mathbf{u}$ is the "Uncertainty" label from Eq. \ref{['eq:min_ineff']}; and $\mathbf{w}$ is the noisy-label probability from Eq. \ref{['eq:loss_gmm']}.
Figure 4: Transition-matrix MSE and classification accuracy on CIFAR-10 IDN at 20% and 40% noise rates. Baseline results are taken from NPC bae2022noisy and kMEIDTM cheng2022instance. The right y-axis shows accuracy; the left y-axis shows transition-matrix MSE ($\times$100).
Figure 5: Eq. \ref{['eq:metrics']} Coverage (Cov) and Uncertainty (Unc) for (a) CIFAR-10-IDN (20% and 50%), (b) CIFAR-100-IDN (20% and 50%), and (c) CIFAR-10N ("Worse" and "Aggre"). Left-column graphs show clean-label coverage. Middle-column graphs show clean/noisy sample uncertainty under the proposed PLS-trained posterior $q(\mathbf{y}|\mathbf{x})$. The right-column graphs show clean/noisy sample uncertainty when training $q(\mathbf{y}|\mathbf{x})$ without our proposed PLS. The dotted vertical line indicates the end of warm-up training.

Theorems & Definitions (6)

Lemma 1: Hard re-labeling error
proof
Lemma 2: Uniform partial-label error
proof
Proposition 1: Sensitivity of the partial-label error
Remark 1: Design implication

Bridging Generative and Discriminative Noisy-Label Learning via Direction-Agnostic EM Formulation

TL;DR

Abstract

Bridging Generative and Discriminative Noisy-Label Learning via Direction-Agnostic EM Formulation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)