An Extended Validity Domain for Constraint Learning

Yilin Zhu; Samuel Burer

An Extended Validity Domain for Constraint Learning

Yilin Zhu, Samuel Burer

TL;DR

This work addresses the extrapolation risk in constraint learning by introducing an extended validity domain, CH^+, which constructs a convex hull in an augmented space that includes the true objective value $f(x)$. The authors show both theoretically and empirically that CH^+ often yields smaller function-value and feasibility errors than traditional validity domains such as Box, CH, or isolation forests, while maintaining reasonable computational requirements. Through extensive synthetic experiments, larger-dimension tests, two stylized optimization models, and a real-world avocado pricing case study, CH^+ demonstrates improved robustness to data-driven mis-specification and better alignment with training data. The study highlights CH^+ as a practical, model-agnostic validity-domain tool that enhances reliability of prescriptive ML solutions and suggests directions for extending these ideas to broader model classes and feasibility considerations.

Abstract

We consider embedding a predictive machine-learning model within a prescriptive optimization problem. In this setting, called constraint learning, we study the concept of a validity domain, i.e., a constraint added to the feasible set, which keeps the optimization close to the training data, thus helping to ensure that the computed optimal solution exhibits less prediction error. In particular, we propose a new validity domain which uses a standard convex-hull idea but in an extended space. We investigate its properties and compare it empirically with existing validity domains on a set of test problems for which the ground truth is known. Results show that our extended convex hull routinely outperforms existing validity domains, especially in terms of the function value error, that is, it exhibits closer agreement between the true function value and the predicted function value at the computed optimal solution. We also consider our approach within two stylized optimization models, which show that our method reduces feasibility error, as well as a real-world pricing case study.

An Extended Validity Domain for Constraint Learning

TL;DR

. The authors show both theoretically and empirically that CH^+ often yields smaller function-value and feasibility errors than traditional validity domains such as Box, CH, or isolation forests, while maintaining reasonable computational requirements. Through extensive synthetic experiments, larger-dimension tests, two stylized optimization models, and a real-world avocado pricing case study, CH^+ demonstrates improved robustness to data-driven mis-specification and better alignment with training data. The study highlights CH^+ as a practical, model-agnostic validity-domain tool that enhances reliability of prescriptive ML solutions and suggests directions for extending these ideas to broader model classes and feasibility considerations.

Abstract

Paper Structure (36 sections, 3 theorems, 24 equations, 8 figures, 12 tables)

This paper contains 36 sections, 3 theorems, 24 equations, 8 figures, 12 tables.

Introduction
Background on Constraint Learning
Fundamentals
Errors
Validity domains
Simple bounds
The convex hull
The enlarged convex hull
Support vector machines
Isolation forests
An Extended Validity Domain
Intuition
Our new validity domain and its variants
Illustration
Numerical Results
...and 21 more sections

Key Result

proposition 1

$v^* = \min\{ \phi : (x,y,\phi) \in \mathop{\mathrm{conv}}\nolimits(F^+) \}$.

Figures (8)

Figure 1: Illustration of $\mathop{\mathrm{CH}}\nolimits$ on the left and $\mathop{\mathrm{CH}}\nolimits^+$ on the right. $\mathop{\mathrm{CH}}\nolimits^+$ exhibits better function-value and optimal-solution errors.
Figure 2: Empirical distributions of the ratio of the function value error of $\mathop{\mathrm{CH}}\nolimits^+$ divided by the function value error of $\mathop{\mathrm{CH}}\nolimits$, grouped by sampling rule.
Figure 3: Avocado data (light circle dots), predicted demand function (curve), and four optimal solutions. The default (i.e., no validity domain) is circle, Box is square, $\mathop{\mathrm{CH}}\nolimits$ is triangle, and $\mathop{\mathrm{CH}}\nolimits^+$ is diamond.
Figure S1: The distribution of the ratio of the function value error obtained by Uniform divided by Normal, grouped by validity domains.
Figure S2: The distribution of the ratio of the function value errors obtained by different noise levels, grouped by validity domains. Error 1 is the ratio between noise level$= 0.1$ and noise level $=0$, Error 2 is the ratio between noise level $=0.2$, and noise level $=0$.
...and 3 more figures

Theorems & Definitions (3)

proposition 1
proposition 2
proposition 3

An Extended Validity Domain for Constraint Learning

TL;DR

Abstract

An Extended Validity Domain for Constraint Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)