Deterministic Global Optimization over trained Kolmogorov Arnold Networks
Tanuj Karia, Giacomo Lastrucci, Artur M. Schweidtmann
TL;DR
The authors develop a deterministic global optimization framework for trained Kolmogorov-Arnold Networks (KANs) by formulating them as MINLPs using a MIQCP-based B-spline activation approach. They introduce several enhancements—feasibility-based bounds tightening, convex hull reformulations, local and redundant cuts, sparsity exploitation, and SiLU bounding via McCormick envelopes—to improve tractability, and implement the approach in Pyomo with SCIP. Through computational experiments on Rosenbrock and peaks functions, they show that KANs can be solved to global optimality with modest runtimes for small input dimensions (up to $n_0\le 5$), while larger KANs require careful architectural choices to remain tractable; convex hull reformulations offer benefits for moderately difficult cases, and local support cuts help larger networks at the expense of simpler instances. Overall, KANs emerge as a promising surrogate for deterministic global optimization in moderate-dimensional settings, offering higher accuracy than small MLP surrogates and favorable solve-times when the architecture is chosen thoughtfully.
Abstract
To address the challenge of tractability for optimizing mathematical models in science and engineering, surrogate models are often employed. Recently, a new class of machine learning models named Kolmogorov Arnold Networks (KANs) have been proposed. It was reported that KANs can approximate a given input/output relationship with a high level of accuracy, requiring significantly fewer parameters than multilayer perceptrons. Hence, we aim to assess the suitability of deterministic global optimization of trained KANs by proposing their Mixed-Integer Nonlinear Programming (MINLP) formulation. We conduct extensive computational experiments for different KAN architectures. Additionally, we propose alternative convex hull reformulation, local support and redundant constraints for the formulation aimed at improving the effectiveness of the MINLP formulation of the KAN. KANs demonstrate high accuracy while requiring relatively modest computational effort to optimize them, particularly for cases with less than five inputs or outputs. For cases with higher inputs or outputs, carefully considering the KAN architecture during training may improve its effectiveness while optimizing over a trained KAN. Overall, we observe that KANs offer a promising alternative as surrogate models for deterministic global optimization.
