Table of Contents
Fetching ...

Obey validity limits of data-driven models

Artur M Schweidtmann, Jana M Weber, Christian Wende, Linus Netze, Alexander Mitsos

TL;DR

This work tackles extrapolation risk in data-driven optimization by combining topological data analysis with validity-domain learning. A three-step workflow uses persistent homology to detect data holes, then models the validity region with either a convex hull or a one-class SVM, and finally enforces this region as a constraint in deterministic global optimization, aided by a reduced-space formulation and kernel-based envelopes. Illustrative 2D cases and a sulfur-recovery-unit application show that one-class SVMs better capture nonconvex validity regions and, when solved with reduced-space optimization, deliver substantial speedups (up to over 3,000×) compared with full-space or FS approaches, while maintaining or improving accuracy. The methods are released as open-source in the MeLOn toolbox, enabling safe, reliable deployment of data-driven models in engineering design and operation.

Abstract

Data-driven models are becoming increasingly popular in engineering, on their own or in combination with mechanistic models. Commonly, the trained models are subsequently used in model-based optimization of design and/or operation of processes. Thus, it is critical to ensure that data-driven models are not evaluated outside their validity domain during process optimization. We propose a method to learn this validity domain and encode it as constraints in process optimization. We first perform a topological data analysis using persistent homology identifying potential holes or separated clusters in the training data. In case clusters or holes are identified, we train a one-class classifier, i.e., a one-class support vector machine, on the training data domain and encode it as constraints in the subsequent process optimization. Otherwise, we construct the convex hull of the data and encode it as constraints. We finally perform deterministic global process optimization with the data-driven models subject to their respective validity constraints. To ensure computational tractability, we develop a reduced-space formulation for trained one-class support vector machines and show that our formulation outperforms common full-space formulations by a factor of over 3,000, making it a viable tool for engineering applications. The method is ready-to-use and available open-source as part of our MeLOn toolbox (https://git.rwth-aachen.de/avt.svt/public/MeLOn).

Obey validity limits of data-driven models

TL;DR

This work tackles extrapolation risk in data-driven optimization by combining topological data analysis with validity-domain learning. A three-step workflow uses persistent homology to detect data holes, then models the validity region with either a convex hull or a one-class SVM, and finally enforces this region as a constraint in deterministic global optimization, aided by a reduced-space formulation and kernel-based envelopes. Illustrative 2D cases and a sulfur-recovery-unit application show that one-class SVMs better capture nonconvex validity regions and, when solved with reduced-space optimization, deliver substantial speedups (up to over 3,000×) compared with full-space or FS approaches, while maintaining or improving accuracy. The methods are released as open-source in the MeLOn toolbox, enabling safe, reliable deployment of data-driven models in engineering design and operation.

Abstract

Data-driven models are becoming increasingly popular in engineering, on their own or in combination with mechanistic models. Commonly, the trained models are subsequently used in model-based optimization of design and/or operation of processes. Thus, it is critical to ensure that data-driven models are not evaluated outside their validity domain during process optimization. We propose a method to learn this validity domain and encode it as constraints in process optimization. We first perform a topological data analysis using persistent homology identifying potential holes or separated clusters in the training data. In case clusters or holes are identified, we train a one-class classifier, i.e., a one-class support vector machine, on the training data domain and encode it as constraints in the subsequent process optimization. Otherwise, we construct the convex hull of the data and encode it as constraints. We finally perform deterministic global process optimization with the data-driven models subject to their respective validity constraints. To ensure computational tractability, we develop a reduced-space formulation for trained one-class support vector machines and show that our formulation outperforms common full-space formulations by a factor of over 3,000, making it a viable tool for engineering applications. The method is ready-to-use and available open-source as part of our MeLOn toolbox (https://git.rwth-aachen.de/avt.svt/public/MeLOn).

Paper Structure

This paper contains 11 sections, 6 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Overview of the proposed three step methodology to obey validity limits of data-driven models in optimization
  • Figure 2: Illustration of a Vietoris-Rips filtration utilized for persistent homology. The upper part shows the data set and circles around the data points with increasing diameter $\epsilon$. The bottom image illustrates the simplicial complexes formed during the filtration. The figure is based on kimura2017quantification
  • Figure 3: Persistent homology plot of the illustrative point cloud. The x-axis shows the birth and the y-axis the death of the homology groups
  • Figure 4: Comparison of the persistent homology plots of the eight illustrative data sets
  • Figure 5: Comparison of the convex hull and the boundaries learned by the one-class SVMs
  • ...and 2 more figures