Accurate and Reliable Predictions with Mutual-Transport Ensemble

Han Liu; Peng Cui; Bingning Wang; Jun Zhu; Xiaolin Hu

Accurate and Reliable Predictions with Mutual-Transport Ensemble

Han Liu, Peng Cui, Bingning Wang, Jun Zhu, Xiaolin Hu

TL;DR

The paper tackles the problem of achieving both high predictive accuracy and reliable uncertainty estimates in deep neural networks for safety-critical applications. It introduces Mutual-Transport Ensemble (MTE), which couples a primary model with a co-trained auxiliary model through adaptive KL-divergence regularization to calibrate the output distribution. The method preserves inference efficiency by requiring only a single primary model at test time and scales to multiple auxiliaries. Empirically, MTE yields substantial gains in accuracy and reductions in calibration error metrics across datasets such as CIFAR-100, outperforming Deep Ensembles and prior calibration approaches.

Abstract

Deep Neural Networks (DNNs) have achieved remarkable success in a variety of tasks, especially when it comes to prediction accuracy. However, in complex real-world scenarios, particularly in safety-critical applications, high accuracy alone is not enough. Reliable uncertainty estimates are crucial. Modern DNNs, often trained with cross-entropy loss, tend to be overconfident, especially with ambiguous samples. To improve uncertainty calibration, many techniques have been developed, but they often compromise prediction accuracy. To tackle this challenge, we propose the ``mutual-transport ensemble'' (MTE). This approach introduces a co-trained auxiliary model and adaptively regularizes the cross-entropy loss using Kullback-Leibler (KL) divergence between the prediction distributions of the primary and auxiliary models. We conducted extensive studies on various benchmarks to validate the effectiveness of our method. The results show that MTE can simultaneously enhance both accuracy and uncertainty calibration. For example, on the CIFAR-100 dataset, our MTE method on ResNet34/50 achieved significant improvements compared to previous state-of-the-art method, with absolute accuracy increases of 2.4%/3.7%, relative reductions in ECE of $42.3%/29.4%, and relative reductions in classwise-ECE of 11.6%/15.3%.

Accurate and Reliable Predictions with Mutual-Transport Ensemble

TL;DR

Abstract

Paper Structure (26 sections, 20 equations, 5 figures, 8 tables)

This paper contains 26 sections, 20 equations, 5 figures, 8 tables.

Introduction
Background and Related Work
The MTE Method
Mutual-Transport Ensemble
Extending to Multiple Auxiliary Models
The Connection between MTE and DE
Experiments
Experimental Setup
Classification Accuracy and Calibration
Analysis of MTE
Misclassification & OOD Detection
Calibration Performance on Noise-Perturbed Images
Experiments on Domain Adaptation Capability
Comparison with Deep Mutual Learning
Limitation and Future Work
...and 11 more sections

Figures (5)

Figure 1: (a) Scatter plot of ECE and accuracy for different calibration methods based on the ResNet34 backbone on the CIFAR-100 test set. Methods located closer to the top left corner perform better. Our MTE method is positioned at the top left corner, indicating the best performance. (b) Schematic diagram of the training process for MTE. (c) Schematic diagram of the inference process for MTE.
Figure 2: Reliability diagrams (top) and confidence histograms (bottom) of (a) Cross Entropy, (b) DE-3, and (c) MTE-1 on CIFAR-100. In the reliability diagrams, blue bars depict the accuracy of model-predicted samples within various confidence intervals, while red bars signify the disparities between confidence and accuracy within the current probability interval. Ideally, a perfectly calibrated model would exhibit all blue bars aligned on the diagonal, implying the absence of red bars. Confidence histograms illustrate confidence distribution, with the green dashed line indicating average confidence and the red dashed line representing prediction accuracy.
Figure 3: Box plot: ECEs of different methods on CIFAR-100-C under all types of corruptions with 5 levels of shift intensity. Each box shows a summary of the results of 16 types of shifts
Figure S1: Variation of accuracy, ECE, and classwise-ECE of MTE method trained on CIFAR-100 with different values of hyper-parameter $\alpha$
Figure S2: The performance comparison of MTE and DML on accuracy, ECE, and CW-ECE metrics.

Accurate and Reliable Predictions with Mutual-Transport Ensemble

TL;DR

Abstract

Accurate and Reliable Predictions with Mutual-Transport Ensemble

Authors

TL;DR

Abstract

Table of Contents

Figures (5)