Table of Contents
Fetching ...

Towards certification: A complete statistical validation pipeline for supervised learning in industry

Lucas Lacasa, Abel Pardo, Pablo Arbelo, Miguel Sánchez, Pablo Yeste, Noelia Bascones, Alejandro Martínez-Cava, Gonzalo Rubio, Ignacio Gómez, Eusebio Valero, Javier de Vicente

TL;DR

This paper outlines a complete validation pipeline that integrates deep learning, optimization and statistical methods, and illustrates the application in a realistic supervised problem arising in aerostructural design: predicting the likelikood of different stress-related failure modes during different airflight maneuvers.

Abstract

Methods of Machine and Deep Learning are gradually being integrated into industrial operations, albeit at different speeds for different types of industries. The aerospace and aeronautical industries have recently developed a roadmap for concepts of design assurance and integration of neural network-related technologies in the aeronautical sector. This paper aims to contribute to this paradigm of AI-based certification in the context of supervised learning, by outlining a complete validation pipeline that integrates deep learning, optimization and statistical methods. This pipeline is composed by a directed graphical model of ten steps. Each of these steps is addressed by a merging key concepts from different contributing disciplines (from machine learning or optimization to statistics) and adapting them to an industrial scenario, as well as by developing computationally efficient algorithmic solutions. We illustrate the application of this pipeline in a realistic supervised problem arising in aerostructural design: predicting the likelikood of different stress-related failure modes during different airflight maneuvers based on a (large) set of features characterising the aircraft internal loads and geometric parameters.

Towards certification: A complete statistical validation pipeline for supervised learning in industry

TL;DR

This paper outlines a complete validation pipeline that integrates deep learning, optimization and statistical methods, and illustrates the application in a realistic supervised problem arising in aerostructural design: predicting the likelikood of different stress-related failure modes during different airflight maneuvers.

Abstract

Methods of Machine and Deep Learning are gradually being integrated into industrial operations, albeit at different speeds for different types of industries. The aerospace and aeronautical industries have recently developed a roadmap for concepts of design assurance and integration of neural network-related technologies in the aeronautical sector. This paper aims to contribute to this paradigm of AI-based certification in the context of supervised learning, by outlining a complete validation pipeline that integrates deep learning, optimization and statistical methods. This pipeline is composed by a directed graphical model of ten steps. Each of these steps is addressed by a merging key concepts from different contributing disciplines (from machine learning or optimization to statistics) and adapting them to an industrial scenario, as well as by developing computationally efficient algorithmic solutions. We illustrate the application of this pipeline in a realistic supervised problem arising in aerostructural design: predicting the likelikood of different stress-related failure modes during different airflight maneuvers based on a (large) set of features characterising the aircraft internal loads and geometric parameters.

Paper Structure

This paper contains 15 sections, 13 equations, 17 figures.

Figures (17)

  • Figure 1: Sketch of the industrially-tailored validation pipeline of a Machine Learning model we propose in this paper. Each of the boxes is an important part of the whole process and is detailed as a specific subsection in the text. Observe that the pipeline is a directed graph with several cycles, that illustrate the different iterative refinements of the whole process.
  • Figure 2: Illustration of frames and stringers in an aircraft's fuselage section, along with a cylindrical coordinate system. Image edited by the authors based on imagen_superstringer, under Creative Commons Attribution 3.0 Unported.
  • Figure 3: Illustration of the proximity method, where within a given data voxel (here a 2-dimensional box) populated by 100 training data, we compute the shortest distance between a given test datum (red dots) and the voxel's training set (blue dots). This distance is then compared to the distribution of shortest distances found by computing the shortest distance between any pair of training set data within the voxel. The test datum is deemed 'valid' is its shortest distance to the voxel's training set is within the 95% confidence interval, i.e. between the 2.5 and 97.5 percentiles (first row). Otherwise the test datum is deemed 'invalid': those test data whose distance is above the 97.5 percentile (second row) are classified as within an isolated region inside the voxel (and therefore will lead to poor interpolation), whereas if the distance is below the 2.5 percentile (third row), the test data is classified as performing p-hacking, i.e. it is too close to the training set.
  • Figure 4: Number of datapoints of the training (blue) and test (orange) sets inside each voxel. There are about 800 voxels. For better visualization, voxels have been artificially binned.
  • Figure 5: Sketch of the surrogate model's errors (e.g. mean square error as per Eq.\ref{['eq:Error_MSE']}) applied to the training set and test set, as a function of the surrogate model's 'complexity', i.e. loosely speaking the number of tunable hyperparameters, showcasing the bias-variance trade-off paradigm: when a model is not complex enough, it cannot learn the patterns in the data, and thus both training and test errors are high (high bias region). When a model is unnecessarily complex (severely overparametrized), it will overfit and thus will have a very low training error but a large test error (as the model learnt not only the patterns in the data but also the random irregularities in the training data, which do not systematically appear in the test data). This is the high variance regime. A good model is the one that strikes a balance.
  • ...and 12 more figures