Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds

Shikun Li; Xiaobo Xia; Jiankang Deng; Shiming Ge; Tongliang Liu

Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds

Shikun Li, Xiaobo Xia, Jiankang Deng, Shiming Ge, Tongliang Liu

TL;DR

This work tackles learning from crowds under annotator- and instance-dependent label noise by proposing TAIDTM, a deep-learning framework that simultaneously models a global noise-transition network and per-annotator networks. The approach uses knowledge transfer: first from a global mixture of noise patterns to individual annotators, then via a graph-convolutional network that calibrates neighboring annotators to mitigate negative transfer. Theoretical analysis provides generalization bounds showing when global-to-individual and neighbor transfers improve performance under annotation sparsity. Empirically, TAIDTM consistently outperforms baselines on both synthetic and real-world crowdsourcing datasets, with ablations confirming the importance of both transfer steps and robust graph-based conditioning against noise.

Abstract

Learning from crowds describes that the annotations of training data are obtained with crowd-sourcing services. Multiple annotators each complete their own small part of the annotations, where labeling mistakes that depend on annotators occur frequently. Modeling the label-noise generation process by the noise transition matrix is a power tool to tackle the label noise. In real-world crowd-sourcing scenarios, noise transition matrices are both annotator- and instance-dependent. However, due to the high complexity of annotator- and instance-dependent transition matrices (AIDTM), annotation sparsity, which means each annotator only labels a little part of instances, makes modeling AIDTM very challenging. Prior works simplify the problem by assuming the transition matrix is instance-independent or using simple parametric ways, which lose modeling generality. Motivated by this, we target a more realistic problem, estimating general AIDTM in practice. Without losing modeling generality, we parameterize AIDTM with deep neural networks. To alleviate the modeling challenge, we suppose every annotator shares its noise pattern with similar annotators, and estimate AIDTM via knowledge transfer. We hence first model the mixture of noise patterns by all annotators, and then transfer this modeling to individual annotators. Furthermore, considering that the transfer from the mixture of noise patterns to individuals may cause two annotators with highly different noise generations to perturb each other, we employ the knowledge transfer between identified neighboring annotators to calibrate the modeling. Theoretical analyses are derived to demonstrate that both the knowledge transfer from global to individuals and the knowledge transfer between neighboring individuals can help model general AIDTM. Experiments confirm the superiority of the proposed approach on synthetic and real-world crowd-sourcing data.

Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds

TL;DR

Abstract

Paper Structure (30 sections, 10 theorems, 35 equations, 4 figures, 8 tables, 2 algorithms)

This paper contains 30 sections, 10 theorems, 35 equations, 4 figures, 8 tables, 2 algorithms.

Introduction
Related Works
Learning from Crowds
Instance-dependent Label-noise Learning
Methodology
Preliminaries
Training the Global Noise-transition Network
Learning Individual Noise-transition Networks
Knowledge Transfer between Neighboring Individual Networks
Classifier Training with Loss Correction
Theoretical Justification
Knowledge Transfer from Global to Individuals
Knowledge Transfer between Neighboring Individuals
Computational complexity analysis
Experiments
...and 15 more sections

Key Result

Theorem 1

Assume the Frobenius norm of the weight matrices ${W}_1,\ldots,{W}_d$ are at most $M_1,\ldots, M_d$, and the instances $\boldsymbol{x}$ are upper bounded by $B$, i.e., $\|\boldsymbol{x}\|\leq B$ for all $\boldsymbol{x}\in\mathcal{{X}}$. Let the activation functions be 1-Lipschitz, positive-homogeneo

Figures (4)

Figure 1: The illustration of transferring the global noise-transition network to individual noise-transition network.
Figure 2: The illustration of the knowledge transfer between neighboring individuals based on a GCN-based mapping function, which merges the node features of neighboring annotators and maps them into the last layer’s parameters of individual noise-transition networks.
Figure 3: The ablation study conducted on the CIFAR10 dataset with AIDN-40% label noise. (a) the test accuracy (%) vs. the average number of annotations for instances; (b) the test accuracy (%) vs. the number of annotator groups.
Figure 4: Visualization of the last layer’s parameters belonging to different individuals and the contrusted similarity graph. The first two rows denote that the experiments are conducted under diffetent hyperparameters $k$. The last two rows denote that the experiments are conducted under different numbers of annotator groups. All experiments are conducted on CIFAR10 with AIDN-40$\%$ label noise.

Theorems & Definitions (15)

Theorem 1
Theorem 2
Theorem 3
proof
Theorem 4
proof
Corollary 1
proof
Definition 1: bartlett2003rademacher
Theorem 5: bartlett2003rademacher
...and 5 more

Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds

TL;DR

Abstract

Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (15)