Table of Contents
Fetching ...

MLPs and KANs for data-driven learning in physical problems: A performance comparison

Raghav Pant, Sikan Li, Xingjian Li, Hassan Iqbal, Krishna Kumar

TL;DR

A comparative study of KANs and MLPs for learning physical systems governed by PDEs reveals that although KANs do not consistently outperform MLPs when configured as deep neural networks, they demonstrate superior expressiveness in shallow network settings, significantly outpacing MLPs in accuracy over test cases.

Abstract

There is increasing interest in solving partial differential equations (PDEs) by casting them as machine learning problems. Recently, there has been a spike in exploring Kolmogorov-Arnold Networks (KANs) as an alternative to traditional neural networks represented by Multi-Layer Perceptrons (MLPs). While showing promise, their performance advantages in physics-based problems remain largely unexplored. Several critical questions persist: Can KANs capture complex physical dynamics and under what conditions might they outperform traditional architectures? In this work, we present a comparative study of KANs and MLPs for learning physical systems governed by PDEs. We assess their performance when applied in deep operator networks (DeepONet) and graph network-based simulators (GNS), and test them on physical problems that vary significantly in scale and complexity. Drawing inspiration from the Kolmogorov Representation Theorem, we examine the behavior of KANs and MLPs across shallow and deep network architectures. Our results reveal that although KANs do not consistently outperform MLPs when configured as deep neural networks, they demonstrate superior expressiveness in shallow network settings, significantly outpacing MLPs in accuracy over our test cases. This suggests that KANs are a promising choice, offering a balance of efficiency and accuracy in applications involving physical systems.

MLPs and KANs for data-driven learning in physical problems: A performance comparison

TL;DR

A comparative study of KANs and MLPs for learning physical systems governed by PDEs reveals that although KANs do not consistently outperform MLPs when configured as deep neural networks, they demonstrate superior expressiveness in shallow network settings, significantly outpacing MLPs in accuracy over test cases.

Abstract

There is increasing interest in solving partial differential equations (PDEs) by casting them as machine learning problems. Recently, there has been a spike in exploring Kolmogorov-Arnold Networks (KANs) as an alternative to traditional neural networks represented by Multi-Layer Perceptrons (MLPs). While showing promise, their performance advantages in physics-based problems remain largely unexplored. Several critical questions persist: Can KANs capture complex physical dynamics and under what conditions might they outperform traditional architectures? In this work, we present a comparative study of KANs and MLPs for learning physical systems governed by PDEs. We assess their performance when applied in deep operator networks (DeepONet) and graph network-based simulators (GNS), and test them on physical problems that vary significantly in scale and complexity. Drawing inspiration from the Kolmogorov Representation Theorem, we examine the behavior of KANs and MLPs across shallow and deep network architectures. Our results reveal that although KANs do not consistently outperform MLPs when configured as deep neural networks, they demonstrate superior expressiveness in shallow network settings, significantly outpacing MLPs in accuracy over our test cases. This suggests that KANs are a promising choice, offering a balance of efficiency and accuracy in applications involving physical systems.

Paper Structure

This paper contains 19 sections, 3 theorems, 22 equations, 20 figures, 8 tables.

Key Result

Theorem 1

cybenko1989approximationhornik1991approximation Let $C(K, \mathbb{R}^m)$ denote the set of continuous functions from a compact subset $K \subseteq \mathbb{R}^n$ to $\mathbb{R}^m$. Suppose $\sigma \in C(\mathbb{R}, \mathbb{R})$ is a non-polynomial activation function applied elementwise. Then, for ev where the function $g(x)$ is represented by a single-hidden-layer neural network:

Figures (20)

  • Figure 1: Schematic representation of a GNS model.
  • Figure 2: Training loss curves for Burger's equation problem. Left: results for the shallow network models. Right: results for the deep network models. The loss curves are smoothed using a moving average method for better representation.
  • Figure 3: prediction of shallow network models for the Burgers example. From left to right: True solution, MLP prediction, KAN prediction, absolute error for the MLP prediction, absolute error for the KAN prediction.
  • Figure 4: prediction of deep network models for the Burgers example. From left to right: True solution, MLP prediction, KAN prediction, absolute error for the MLP prediction, absolute error for the KAN prediction.
  • Figure 5: Training loss curves for 1D Darcy problem. The loss curves are smoothed using a moving average method for better representation.
  • ...and 15 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Theorem 3