Table of Contents
Fetching ...

Deterministic Global Optimization over trained Kolmogorov Arnold Networks

Tanuj Karia, Giacomo Lastrucci, Artur M. Schweidtmann

TL;DR

The authors develop a deterministic global optimization framework for trained Kolmogorov-Arnold Networks (KANs) by formulating them as MINLPs using a MIQCP-based B-spline activation approach. They introduce several enhancements—feasibility-based bounds tightening, convex hull reformulations, local and redundant cuts, sparsity exploitation, and SiLU bounding via McCormick envelopes—to improve tractability, and implement the approach in Pyomo with SCIP. Through computational experiments on Rosenbrock and peaks functions, they show that KANs can be solved to global optimality with modest runtimes for small input dimensions (up to $n_0\le 5$), while larger KANs require careful architectural choices to remain tractable; convex hull reformulations offer benefits for moderately difficult cases, and local support cuts help larger networks at the expense of simpler instances. Overall, KANs emerge as a promising surrogate for deterministic global optimization in moderate-dimensional settings, offering higher accuracy than small MLP surrogates and favorable solve-times when the architecture is chosen thoughtfully.

Abstract

To address the challenge of tractability for optimizing mathematical models in science and engineering, surrogate models are often employed. Recently, a new class of machine learning models named Kolmogorov Arnold Networks (KANs) have been proposed. It was reported that KANs can approximate a given input/output relationship with a high level of accuracy, requiring significantly fewer parameters than multilayer perceptrons. Hence, we aim to assess the suitability of deterministic global optimization of trained KANs by proposing their Mixed-Integer Nonlinear Programming (MINLP) formulation. We conduct extensive computational experiments for different KAN architectures. Additionally, we propose alternative convex hull reformulation, local support and redundant constraints for the formulation aimed at improving the effectiveness of the MINLP formulation of the KAN. KANs demonstrate high accuracy while requiring relatively modest computational effort to optimize them, particularly for cases with less than five inputs or outputs. For cases with higher inputs or outputs, carefully considering the KAN architecture during training may improve its effectiveness while optimizing over a trained KAN. Overall, we observe that KANs offer a promising alternative as surrogate models for deterministic global optimization.

Deterministic Global Optimization over trained Kolmogorov Arnold Networks

TL;DR

The authors develop a deterministic global optimization framework for trained Kolmogorov-Arnold Networks (KANs) by formulating them as MINLPs using a MIQCP-based B-spline activation approach. They introduce several enhancements—feasibility-based bounds tightening, convex hull reformulations, local and redundant cuts, sparsity exploitation, and SiLU bounding via McCormick envelopes—to improve tractability, and implement the approach in Pyomo with SCIP. Through computational experiments on Rosenbrock and peaks functions, they show that KANs can be solved to global optimality with modest runtimes for small input dimensions (up to ), while larger KANs require careful architectural choices to remain tractable; convex hull reformulations offer benefits for moderately difficult cases, and local support cuts help larger networks at the expense of simpler instances. Overall, KANs emerge as a promising surrogate for deterministic global optimization in moderate-dimensional settings, offering higher accuracy than small MLP surrogates and favorable solve-times when the architecture is chosen thoughtfully.

Abstract

To address the challenge of tractability for optimizing mathematical models in science and engineering, surrogate models are often employed. Recently, a new class of machine learning models named Kolmogorov Arnold Networks (KANs) have been proposed. It was reported that KANs can approximate a given input/output relationship with a high level of accuracy, requiring significantly fewer parameters than multilayer perceptrons. Hence, we aim to assess the suitability of deterministic global optimization of trained KANs by proposing their Mixed-Integer Nonlinear Programming (MINLP) formulation. We conduct extensive computational experiments for different KAN architectures. Additionally, we propose alternative convex hull reformulation, local support and redundant constraints for the formulation aimed at improving the effectiveness of the MINLP formulation of the KAN. KANs demonstrate high accuracy while requiring relatively modest computational effort to optimize them, particularly for cases with less than five inputs or outputs. For cases with higher inputs or outputs, carefully considering the KAN architecture during training may improve its effectiveness while optimizing over a trained KAN. Overall, we observe that KANs offer a promising alternative as surrogate models for deterministic global optimization.

Paper Structure

This paper contains 23 sections, 33 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Representation of a Kolmogorov-Arnold Network as a directed acyclic graph. Dashed gray line on a neuron represents the edge with the activation connecting two neurons. The network is fully connected.
  • Figure 2: Comparison of different bounding strategies for the SiLU function in the range $[-5,5]$. The McCormick under and over-estimators without the introduction of an auxiliary variable are derived using MC++Chachuat2015.
  • Figure 3: Performance profiles comparing the different formulation configurations of KANs