Table of Contents
Fetching ...

Approach to Finding a Robust Deep Learning Model

Alexey Boldyrev, Fedor Ratnikov, Andrey Shevelev

TL;DR

This work tackles the problem of ensuring reliability in deep learning predictions by proposing a robustness-detection framework and a meta-algorithm for automated model selection. The authors apply this approach to small CNNs evaluating energy and position reconstruction in calorimeter simulations, systematically varying training sample size, weight initialization, and inductive bias. Their results show that a carefully chosen model selection process can identify robust models faster and with fewer instances than exhaustive searches, with inductive bias reducing the required data for robust performance. The study provides practical implications for AutoML and fault-tolerant ML applications, illustrating how robustness, rather than peak accuracy alone, can guide model choice in complex, distribution-shifting environments.

Abstract

The rapid development of machine learning (ML) and artificial intelligence (AI) applications requires the training of large numbers of models. This growing demand highlights the importance of training models without human supervision, while ensuring that their predictions are reliable. In response to this need, we propose a novel approach for determining model robustness. This approach, supplemented with a proposed model selection algorithm designed as a meta-algorithm, is versatile and applicable to any machine learning model, provided that it is appropriate for the task at hand. This study demonstrates the application of our approach to evaluate the robustness of deep learning models. To this end, we study small models composed of a few convolutional and fully connected layers, using common optimizers due to their ease of interpretation and computational efficiency. Within this framework, we address the influence of training sample size, model weight initialization, and inductive bias on the robustness of deep learning models.

Approach to Finding a Robust Deep Learning Model

TL;DR

This work tackles the problem of ensuring reliability in deep learning predictions by proposing a robustness-detection framework and a meta-algorithm for automated model selection. The authors apply this approach to small CNNs evaluating energy and position reconstruction in calorimeter simulations, systematically varying training sample size, weight initialization, and inductive bias. Their results show that a carefully chosen model selection process can identify robust models faster and with fewer instances than exhaustive searches, with inductive bias reducing the required data for robust performance. The study provides practical implications for AutoML and fault-tolerant ML applications, illustrating how robustness, rather than peak accuracy alone, can guide model choice in complex, distribution-shifting environments.

Abstract

The rapid development of machine learning (ML) and artificial intelligence (AI) applications requires the training of large numbers of models. This growing demand highlights the importance of training models without human supervision, while ensuring that their predictions are reliable. In response to this need, we propose a novel approach for determining model robustness. This approach, supplemented with a proposed model selection algorithm designed as a meta-algorithm, is versatile and applicable to any machine learning model, provided that it is appropriate for the task at hand. This study demonstrates the application of our approach to evaluate the robustness of deep learning models. To this end, we study small models composed of a few convolutional and fully connected layers, using common optimizers due to their ease of interpretation and computational efficiency. Within this framework, we address the influence of training sample size, model weight initialization, and inductive bias on the robustness of deep learning models.

Paper Structure

This paper contains 21 sections, 1 equation, 18 figures, 3 tables, 1 algorithm.

Figures (18)

  • Figure 1: Properties of input data for Dataset A and Dataset B. Left: A histogram of energy spectrum. Center: A histogram of angular spectrum on one of the axes of the calorimeter plane. Right: Example of a calorimetric cluster from a particle with energy 65 GeV.
  • Figure 2: Boxplots of the losses for models consisting of 2 convolutional and 2 fully connected layers with 19 315 trainable parameters for each activation function considered. The energy reconstruction problem is solved and Dataset A is used. 10 instances of the corresponding model are used for each box plot. Models with Sigmoid and Tanh activation functions do not fit within the range of losses shown, as indicated by the upward arrows.
  • Figure 3: Line graphs of the fraction of models with losses less than the current loss for six model selection criteria. Each model aims to solve the energy reconstruction problem and is trained on a sample of size 32 000 examples randomly drawn from Dataset A.
  • Figure 4: Boxplots of the losses for the energy reconstruction problem for Model 1 and Model 2 as a function of the training sample size for Dataset A. Model 2 uses the sum of the energies in the cells transferred after the first convolution layer. 50 instances of the corresponding model are used for each box plot.
  • Figure 5: Boxplots of the losses for the energy reconstruction problem for Model 1 and Model 2 as a function of the training sample size for Dataset B. Model 2 uses the sum of the energies in the cells transferred after the first convolution layer. 50 instances of the corresponding model are used for each box plot.
  • ...and 13 more figures