Table of Contents
Fetching ...

Deep Sturm--Liouville: From Sample-Based to 1D Regularization with Learnable Orthogonal Basis Functions

David Vigouroux, Joseba Dalmau, Louis Béthune, Victor Boutin

TL;DR

Deep Sturm--Liouville (DSL) introduces a 1D regularization framework that propagates regularity along field lines spanning the input domain. A neural-network–driven vector field defines field lines $\gamma^x(t)$, along which a learnable Sturm--Liouville problem yields an orthogonal basis $u_i^x(t)$; these bases are combined linearly to form the predictor, with both the vector field and the SL coefficients learned jointly via implicit differentiation. The method connects to the Rank-1 Parabolic Eigenvalue Problem and provides theoretical guarantees of an orthogonal basis across the domain, while delivering competitive performance and improved sample efficiency on tabular and image datasets. Empirically, DSL achieves results comparable to standard neural networks with about 10 eigenfunctions and shows enhanced data efficiency in low-data scenarios, illustrating the practical value of moving from 0D to 1D regularization in deep learning.

Abstract

Although Artificial Neural Networks (ANNs) have achieved remarkable success across various tasks, they still suffer from limited generalization. We hypothesize that this limitation arises from the traditional sample-based (0--dimensionnal) regularization used in ANNs. To overcome this, we introduce \textit{Deep Sturm--Liouville} (DSL), a novel function approximator that enables continuous 1D regularization along field lines in the input space by integrating the Sturm--Liouville Theorem (SLT) into the deep learning framework. DSL defines field lines traversing the input space, along which a Sturm--Liouville problem is solved to generate orthogonal basis functions, enforcing implicit regularization thanks to the desirable properties of SLT. These basis functions are linearly combined to construct the DSL approximator. Both the vector field and basis functions are parameterized by neural networks and learned jointly. We demonstrate that the DSL formulation naturally arises when solving a Rank-1 Parabolic Eigenvalue Problem. DSL is trained efficiently using stochastic gradient descent via implicit differentiation. DSL achieves competitive performance and demonstrate improved sample efficiency on diverse multivariate datasets including high-dimensional image datasets such as MNIST and CIFAR-10.

Deep Sturm--Liouville: From Sample-Based to 1D Regularization with Learnable Orthogonal Basis Functions

TL;DR

Deep Sturm--Liouville (DSL) introduces a 1D regularization framework that propagates regularity along field lines spanning the input domain. A neural-network–driven vector field defines field lines , along which a learnable Sturm--Liouville problem yields an orthogonal basis ; these bases are combined linearly to form the predictor, with both the vector field and the SL coefficients learned jointly via implicit differentiation. The method connects to the Rank-1 Parabolic Eigenvalue Problem and provides theoretical guarantees of an orthogonal basis across the domain, while delivering competitive performance and improved sample efficiency on tabular and image datasets. Empirically, DSL achieves results comparable to standard neural networks with about 10 eigenfunctions and shows enhanced data efficiency in low-data scenarios, illustrating the practical value of moving from 0D to 1D regularization in deep learning.

Abstract

Although Artificial Neural Networks (ANNs) have achieved remarkable success across various tasks, they still suffer from limited generalization. We hypothesize that this limitation arises from the traditional sample-based (0--dimensionnal) regularization used in ANNs. To overcome this, we introduce \textit{Deep Sturm--Liouville} (DSL), a novel function approximator that enables continuous 1D regularization along field lines in the input space by integrating the Sturm--Liouville Theorem (SLT) into the deep learning framework. DSL defines field lines traversing the input space, along which a Sturm--Liouville problem is solved to generate orthogonal basis functions, enforcing implicit regularization thanks to the desirable properties of SLT. These basis functions are linearly combined to construct the DSL approximator. Both the vector field and basis functions are parameterized by neural networks and learned jointly. We demonstrate that the DSL formulation naturally arises when solving a Rank-1 Parabolic Eigenvalue Problem. DSL is trained efficiently using stochastic gradient descent via implicit differentiation. DSL achieves competitive performance and demonstrate improved sample efficiency on diverse multivariate datasets including high-dimensional image datasets such as MNIST and CIFAR-10.

Paper Structure

This paper contains 29 sections, 4 theorems, 35 equations, 7 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.1

For any given functions, $p, w: [a,b]\rightarrow\mathbb{R}^+_0$ and $q:[a,b]\rightarrow\mathbb{R}$ of classes $\mathop{\mathrm{\mathcal{C}}}\nolimits^1$, $\mathop{\mathrm{\mathcal{C}}}\nolimits^0$ and $\mathop{\mathrm{\mathcal{C}}}\nolimits^0$ respectively, and real numbers $\alpha_1, \alpha_2, \bet

Figures (7)

  • Figure 1: (a) In Neural ODEs, the vector field acts as a continuous-depth neural network. Regularization applied along a field line represents a 0D regularization (i.e. sample specific). (b) For DSL, the vector field's field lines span the entire input space. Regularizing along these lines applies to all points they pass through, making it 1D regularization.
  • Figure 2: Deep Sturm--Liouville. -- (a) For a given point $x$, the field line $\gamma^x(t)$ is defined by equation (\ref{['eq2']}), it is such that $\gamma^x(0)=x$ and reaches the two points at the boundary of $\Omega$ at time $t^x_-$ and $t^x_+$. (b) On the field line $\gamma^x(t)$, the Sturm--Liouville Problem \ref{['eq_st_curve']} is solved with the parameter functions $p(\gamma^x(t))$, $q(\gamma^x(t))$ and $w(\gamma^x(t))$ to obtain orthogonal function basis $u^x_i$ that, combined linearly, form the DSL function approximator along the field line. The prediction at x is obtained by taking the value of this function at $t=0$.
  • Figure 3: Eigenfunctions on Dry Bean dataset. For two samples, the first three eigenfunctions. The $x$-axis represents the time $t$ of the field line $\gamma^x(t)$.
  • Figure 4: Impact of the implicit and explicit regularization on Dry Bean dataset. Validation accuracy as function of eigenfunctions basis size for the implicit regularization (a) and the spectral regularization coefficient $\alpha$ for explicit regularization in equation \ref{['regu_coeff']}(b).
  • Figure 5: Impact of training sample on Accuracy. Test Accuracy on Bank dataset as function of number of training samples.
  • ...and 2 more figures

Theorems & Definitions (13)

  • Theorem 3.1: Sturm--Liouville Theorem
  • Theorem 3.2
  • Definition 3.3
  • Remark 4.1
  • Remark 4.2
  • Theorem 5.1
  • Definition 5.2
  • Theorem 5.3
  • Remark 2.1
  • Remark 2.2
  • ...and 3 more