Learning Causal Transition Matrix for Instance-dependent Label Noise
Jiahui Li, Tai-Wei Chang, Kun Kuang, Ximing Li, Long Chen, Jun Zhou
TL;DR
The paper tackles learning with noisy labels by addressing the unidentifiability of the transition matrix under real-world conditions. It introduces a causal graph with a two-component input $X=(X_1,X_2)$ and defines a causal transition matrix $P(\hat{Y}|do(Y),X)$ that is identifiable when $X_2$ is recovered, enabling robust inference of the clean label $Y$. A trainable framework combines a separation model, a transition model, co-teaching for partial supervision of $Y$, a policy-gradient estimator for $X_2$, and a decorrelation regularizer for $X_1$, forming an end-to-end denoising pipeline. Across synthetic and real-world datasets with symmetric, asymmetric, and instance-dependent noise, the method achieves state-of-the-art performance, particularly under challenging instance-dependent noise, demonstrating the practical value of causal reasoning for label-noise mitigation.
Abstract
Noisy labels are both inevitable and problematic in machine learning methods, as they negatively impact models' generalization ability by causing overfitting. In the context of learning with noise, the transition matrix plays a crucial role in the design of statistically consistent algorithms. However, the transition matrix is often considered unidentifiable. One strand of methods typically addresses this problem by assuming that the transition matrix is instance-independent; that is, the probability of mislabeling a particular instance is not influenced by its characteristics or attributes. This assumption is clearly invalid in complex real-world scenarios. To better understand the transition relationship and relax this assumption, we propose to study the data generation process of noisy labels from a causal perspective. We discover that an unobservable latent variable can affect either the instance itself, the label annotation procedure, or both, which complicates the identification of the transition matrix. To address various scenarios, we have unified these observations within a new causal graph. In this graph, the input instance is divided into a noise-resistant component and a noise-sensitive component based on whether they are affected by the latent variable. These two components contribute to identifying the ``causal transition matrix'', which approximates the true transition matrix with theoretical guarantee. In line with this, we have designed a novel training framework that explicitly models this causal relationship and, as a result, achieves a more accurate model for inferring the clean label.
