Table of Contents
Fetching ...

Learning Causal Transition Matrix for Instance-dependent Label Noise

Jiahui Li, Tai-Wei Chang, Kun Kuang, Ximing Li, Long Chen, Jun Zhou

TL;DR

The paper tackles learning with noisy labels by addressing the unidentifiability of the transition matrix under real-world conditions. It introduces a causal graph with a two-component input $X=(X_1,X_2)$ and defines a causal transition matrix $P(\hat{Y}|do(Y),X)$ that is identifiable when $X_2$ is recovered, enabling robust inference of the clean label $Y$. A trainable framework combines a separation model, a transition model, co-teaching for partial supervision of $Y$, a policy-gradient estimator for $X_2$, and a decorrelation regularizer for $X_1$, forming an end-to-end denoising pipeline. Across synthetic and real-world datasets with symmetric, asymmetric, and instance-dependent noise, the method achieves state-of-the-art performance, particularly under challenging instance-dependent noise, demonstrating the practical value of causal reasoning for label-noise mitigation.

Abstract

Noisy labels are both inevitable and problematic in machine learning methods, as they negatively impact models' generalization ability by causing overfitting. In the context of learning with noise, the transition matrix plays a crucial role in the design of statistically consistent algorithms. However, the transition matrix is often considered unidentifiable. One strand of methods typically addresses this problem by assuming that the transition matrix is instance-independent; that is, the probability of mislabeling a particular instance is not influenced by its characteristics or attributes. This assumption is clearly invalid in complex real-world scenarios. To better understand the transition relationship and relax this assumption, we propose to study the data generation process of noisy labels from a causal perspective. We discover that an unobservable latent variable can affect either the instance itself, the label annotation procedure, or both, which complicates the identification of the transition matrix. To address various scenarios, we have unified these observations within a new causal graph. In this graph, the input instance is divided into a noise-resistant component and a noise-sensitive component based on whether they are affected by the latent variable. These two components contribute to identifying the ``causal transition matrix'', which approximates the true transition matrix with theoretical guarantee. In line with this, we have designed a novel training framework that explicitly models this causal relationship and, as a result, achieves a more accurate model for inferring the clean label.

Learning Causal Transition Matrix for Instance-dependent Label Noise

TL;DR

The paper tackles learning with noisy labels by addressing the unidentifiability of the transition matrix under real-world conditions. It introduces a causal graph with a two-component input and defines a causal transition matrix that is identifiable when is recovered, enabling robust inference of the clean label . A trainable framework combines a separation model, a transition model, co-teaching for partial supervision of , a policy-gradient estimator for , and a decorrelation regularizer for , forming an end-to-end denoising pipeline. Across synthetic and real-world datasets with symmetric, asymmetric, and instance-dependent noise, the method achieves state-of-the-art performance, particularly under challenging instance-dependent noise, demonstrating the practical value of causal reasoning for label-noise mitigation.

Abstract

Noisy labels are both inevitable and problematic in machine learning methods, as they negatively impact models' generalization ability by causing overfitting. In the context of learning with noise, the transition matrix plays a crucial role in the design of statistically consistent algorithms. However, the transition matrix is often considered unidentifiable. One strand of methods typically addresses this problem by assuming that the transition matrix is instance-independent; that is, the probability of mislabeling a particular instance is not influenced by its characteristics or attributes. This assumption is clearly invalid in complex real-world scenarios. To better understand the transition relationship and relax this assumption, we propose to study the data generation process of noisy labels from a causal perspective. We discover that an unobservable latent variable can affect either the instance itself, the label annotation procedure, or both, which complicates the identification of the transition matrix. To address various scenarios, we have unified these observations within a new causal graph. In this graph, the input instance is divided into a noise-resistant component and a noise-sensitive component based on whether they are affected by the latent variable. These two components contribute to identifying the ``causal transition matrix'', which approximates the true transition matrix with theoretical guarantee. In line with this, we have designed a novel training framework that explicitly models this causal relationship and, as a result, achieves a more accurate model for inferring the clean label.

Paper Structure

This paper contains 22 sections, 2 theorems, 9 equations, 8 figures, 6 tables.

Key Result

Theorem 1

The instance-dependent causal transition matrix $P(\hat{Y} | do(Y), X)$ is identifiable if we recover the noise predictive factor $X_2$.

Figures (8)

  • Figure 1: Examples of three causal graphs illustrating the mislabeling of "5" as "6" in MNIST, where $X$ denotes instance(image), $Y$ denotes the ground truth label, $\hat{Y}$ denotes the noisy label, and $Z$ denotes the latent variable. The dashed circles represent the unobservable variable. (a) The instance is perturbed by noise, making "5" looks like "6". (b) The instance is clean, but it is mislabeled by an annotator. (c) The instance exhibits a mixture situation of (a) and (b).
  • Figure 2: (a) The proposed causal graph for learning with noisy labels. (b) The training framework of our method.
  • Figure 3: The design of the transition model.
  • Figure 4: Model performance on the CIFAR10 Dataset.
  • Figure 5: Causal graphs for data generation process.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2