Table of Contents
Fetching ...

VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

Shijie Fang, Qianhan Feng, Tong Lin

TL;DR

The paper tackles two core SSL challenges: miscalibrated pseudo-label confidences and the inefficiency of using all unlabeled data. It introduces Variational Confidence Calibration (VCC), which fuses ensemble, temporal, and view consistency with a Variational Auto Encoder to produce calibrated pseudo-label confidences, and INFUSE, which uses influence functions to prune unlabeled data into a compact core-set for faster training. Empirically, VCC improves several SSL baselines across multiple datasets, while INFUSE reduces training time without sacrificing accuracy; together, VCC-INFUSE achieves a favorable accuracy-time balance (e.g., ~1.08% lower error than baseline FlexMatch and ~48% less training time on CIFAR-100 with 2500 labels). The work demonstrates practical gains in accuracy and efficiency, offering a flexible plugin for SSL and a principled core-set approach for unlabeled data. Overall, the methods address calibration and efficiency in SSL, showing robust improvements and suggesting broad applicability to other SSL tasks beyond classification.

Abstract

Despite the progress of Semi-supervised Learning (SSL), existing methods fail to utilize unlabeled data effectively and efficiently. Many pseudo-label-based methods select unlabeled examples based on inaccurate confidence scores from the classifier. Most prior work also uses all available unlabeled data without pruning, making it difficult to handle large amounts of unlabeled data. To address these issues, we propose two methods: Variational Confidence Calibration (VCC) and Influence-Function-based Unlabeled Sample Elimination (INFUSE). VCC is an universal plugin for SSL confidence calibration, using a variational autoencoder to select more accurate pseudo labels based on three types of consistency scores. INFUSE is a data pruning method that constructs a core dataset of unlabeled examples under SSL. Our methods are effective in multiple datasets and settings, reducing classification errors rates and saving training time. Together, VCC-INFUSE reduces the error rate of FlexMatch on the CIFAR-100 dataset by 1.08% while saving nearly half of the training time.

VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

TL;DR

The paper tackles two core SSL challenges: miscalibrated pseudo-label confidences and the inefficiency of using all unlabeled data. It introduces Variational Confidence Calibration (VCC), which fuses ensemble, temporal, and view consistency with a Variational Auto Encoder to produce calibrated pseudo-label confidences, and INFUSE, which uses influence functions to prune unlabeled data into a compact core-set for faster training. Empirically, VCC improves several SSL baselines across multiple datasets, while INFUSE reduces training time without sacrificing accuracy; together, VCC-INFUSE achieves a favorable accuracy-time balance (e.g., ~1.08% lower error than baseline FlexMatch and ~48% less training time on CIFAR-100 with 2500 labels). The work demonstrates practical gains in accuracy and efficiency, offering a flexible plugin for SSL and a principled core-set approach for unlabeled data. Overall, the methods address calibration and efficiency in SSL, showing robust improvements and suggesting broad applicability to other SSL tasks beyond classification.

Abstract

Despite the progress of Semi-supervised Learning (SSL), existing methods fail to utilize unlabeled data effectively and efficiently. Many pseudo-label-based methods select unlabeled examples based on inaccurate confidence scores from the classifier. Most prior work also uses all available unlabeled data without pruning, making it difficult to handle large amounts of unlabeled data. To address these issues, we propose two methods: Variational Confidence Calibration (VCC) and Influence-Function-based Unlabeled Sample Elimination (INFUSE). VCC is an universal plugin for SSL confidence calibration, using a variational autoencoder to select more accurate pseudo labels based on three types of consistency scores. INFUSE is a data pruning method that constructs a core dataset of unlabeled examples under SSL. Our methods are effective in multiple datasets and settings, reducing classification errors rates and saving training time. Together, VCC-INFUSE reduces the error rate of FlexMatch on the CIFAR-100 dataset by 1.08% while saving nearly half of the training time.
Paper Structure (20 sections, 28 equations, 1 figure, 10 tables)

This paper contains 20 sections, 28 equations, 1 figure, 10 tables.

Figures (1)

  • Figure 1: The illustration of VCC about training VAE and using the reconstructed confidence for pseudo-label selection.