Fast and reliable uncertainty quantification with neural network ensembles for industrial image classification
Arthur Thuy, Dries F. Benoit
TL;DR
This work tackles reliable uncertainty quantification in industrial image classification under distribution shifts by comparing a single NN, a deep ensemble, and three efficient ensembles (Snapshot, Batch, MIMO) on SIP-17, introducing the Diversity Quality metric $DQ_1$ to jointly assess in-distribution and out-of-distribution performance. The Batch ensemble emerges as a cost-effective and competitive alternative, matching deep-ensemble accuracy and uncertainty while significantly reducing training, testing, and memory requirements; Snapshot and MIMO show more limited gains due to over- or under-diversity and lower ID performance. The study demonstrates that sharing parameters in Batch ensembles preserves strong uncertainty behavior on OOD data and enables effective classification-with-rejection strategies, making it practically valuable for industrial deployment. Overall, the findings support adopting Batch ensembles for reliable, scalable uncertainty quantification in manufacturing contexts and point to future work on larger datasets and sim-to-real validation.
Abstract
Image classification with neural networks (NNs) is widely used in industrial processes, situations where the model likely encounters unknown objects during deployment, i.e., out-of-distribution (OOD) data. Worryingly, NNs tend to make confident yet incorrect predictions when confronted with OOD data. To increase the models' reliability, they should quantify the uncertainty in their own predictions, communicating when the output should (not) be trusted. Deep ensembles, composed of multiple independent NNs, have been shown to perform strongly but are computationally expensive. Recent research has proposed more efficient NN ensembles, namely the snapshot, batch, and multi-input multi-output ensemble. This study investigates the predictive and uncertainty performance of efficient NN ensembles in the context of image classification for industrial processes. It is the first to provide a comprehensive comparison and it proposes a novel Diversity Quality metric to quantify the ensembles' performance on the in-distribution and OOD sets in one single metric. The results highlight the batch ensemble as a cost-effective and competitive alternative to the deep ensemble. It matches the deep ensemble in both uncertainty and accuracy while exhibiting considerable savings in training time, test time, and memory storage.
