Table of Contents
Fetching ...

Noisy Node Classification by Bi-level Optimization based Multi-teacher Distillation

Yujing Liu, Zongqian Wu, Zhengyu Lu, Ci Nie, Guoqiu Wen, Ping Hu, Xiaofeng Zhu

TL;DR

This work tackles noisy node classification on graphs by introducing BO-NNC, a framework that combines multi-teacher distillation with bi-level optimization to adaptively fuse diverse teacher predictions without requiring extensive clean labels. It builds diverse teacher models via unsupervised graph encoders, learns a weight matrix to form soft labels through Hadamard fusion, and trains a student model in a lower-level distillation task while updating the teacher weights at an upper level. A label-improvement module further cleans and augments supervision through noisy-label filtering and pseudo-label selection, enhancing both teacher and student learning. Extensive experiments on five real datasets across multiple noise settings demonstrate state-of-the-art performance, with ablations showing the necessity and synergy of all components. The approach offers a practical path toward robust graph learning under pervasive label noise, leveraging complementary teacher signals and data-driven weight adaptation.

Abstract

Previous graph neural networks (GNNs) usually assume that the graph data is with clean labels for representation learning, but it is not true in real applications. In this paper, we propose a new multi-teacher distillation method based on bi-level optimization (namely BO-NNC), to conduct noisy node classification on the graph data. Specifically, we first employ multiple self-supervised learning methods to train diverse teacher models, and then aggregate their predictions through a teacher weight matrix. Furthermore, we design a new bi-level optimization strategy to dynamically adjust the teacher weight matrix based on the training progress of the student model. Finally, we design a label improvement module to improve the label quality. Extensive experimental results on real datasets show that our method achieves the best results compared to state-of-the-art methods.

Noisy Node Classification by Bi-level Optimization based Multi-teacher Distillation

TL;DR

This work tackles noisy node classification on graphs by introducing BO-NNC, a framework that combines multi-teacher distillation with bi-level optimization to adaptively fuse diverse teacher predictions without requiring extensive clean labels. It builds diverse teacher models via unsupervised graph encoders, learns a weight matrix to form soft labels through Hadamard fusion, and trains a student model in a lower-level distillation task while updating the teacher weights at an upper level. A label-improvement module further cleans and augments supervision through noisy-label filtering and pseudo-label selection, enhancing both teacher and student learning. Extensive experiments on five real datasets across multiple noise settings demonstrate state-of-the-art performance, with ablations showing the necessity and synergy of all components. The approach offers a practical path toward robust graph learning under pervasive label noise, leveraging complementary teacher signals and data-driven weight adaptation.

Abstract

Previous graph neural networks (GNNs) usually assume that the graph data is with clean labels for representation learning, but it is not true in real applications. In this paper, we propose a new multi-teacher distillation method based on bi-level optimization (namely BO-NNC), to conduct noisy node classification on the graph data. Specifically, we first employ multiple self-supervised learning methods to train diverse teacher models, and then aggregate their predictions through a teacher weight matrix. Furthermore, we design a new bi-level optimization strategy to dynamically adjust the teacher weight matrix based on the training progress of the student model. Finally, we design a label improvement module to improve the label quality. Extensive experimental results on real datasets show that our method achieves the best results compared to state-of-the-art methods.
Paper Structure (23 sections, 21 equations, 4 figures, 3 tables)

This paper contains 23 sections, 21 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The framework of the proposed BO-NNC, consisting of three modules, i.e., Multi-teacher Construction, Multi-teacher Distillation, and Label Improvement. Specifically, Multi-teacher Construction employs multiple self-supervised learning methods to obtain diverse teacher models. Multi-teacher Distillation transfers the knowledge from the teacher models to the student model through multi-teacher distillation based on bi-level optimization. Specifically, the lower level makes the student model learn the knowledge of the teacher models from the soft label matrix, which is the Hadamard product between the teacher weight matrix and $k$ prediction probability matrices produced by teacher models. The upper level updates the teacher weight matrix based on the training progress of the student model. Label Improvement uses both the student model and the teacher models to first detect noisy labels and then select pseudo-labels.
  • Figure 2: Parameter sensitivity analysis of $r$ and $\rho$ in our method at 60% noise rate.
  • Figure 3: Parameter sensitivity analysis of $\beta_1$ and $\beta_2$ in our method at 60% noise rate.
  • Figure 4: Parameter sensitivity analysis of $\alpha$ in our method at 60% noise rate.