Iterative Assessment and Improvement of DNN Operational Accuracy
Antonio Guerriero, Roberto Pietrantuono, Stefano Russo
TL;DR
The paper tackles the discrepancy between pre-release and operational DNN accuracy caused by distribution and label shifts. It introduces DAIC, a cycle that blends low-cost online pseudo-oracles with high-cost offline sampling within an MLOps-like framework to estimate and improve operational accuracy. By deploying pseudo-oracles such as SelfChecker and DNN-OS alongside DeepEST sampling, the approach detects drift, triggers targeted labeling and remodeling, and demonstrates that operational accuracy can be faithfully tracked and enhanced, even under label shift. This yields a practical, cost-conscious pathway for iterative DNN improvements in real-world deploying environments, with potential extension beyond image classification to domains like autonomous driving.
Abstract
Deep Neural Networks (DNN) are nowadays largely adopted in many application domains thanks to their human-like, or even superhuman, performance in specific tasks. However, due to unpredictable/unconsidered operating conditions, unexpected failures show up on field, making the performance of a DNN in operation very different from the one estimated prior to release. In the life cycle of DNN systems, the assessment of accuracy is typically addressed in two ways: offline, via sampling of operational inputs, or online, via pseudo-oracles. The former is considered more expensive due to the need for manual labeling of the sampled inputs. The latter is automatic but less accurate. We believe that emerging iterative industrial-strength life cycle models for Machine Learning systems, like MLOps, offer the possibility to leverage inputs observed in operation not only to provide faithful estimates of a DNN accuracy, but also to improve it through remodeling/retraining actions. We propose DAIC (DNN Assessment and Improvement Cycle), an approach which combines ''low-cost'' online pseudo-oracles and ''high-cost'' offline sampling techniques to estimate and improve the operational accuracy of a DNN in the iterations of its life cycle. Preliminary results show the benefits of combining the two approaches and integrating them in the DNN life cycle.
