Table of Contents
Fetching ...

On the Identification and Optimization of Nonsmooth Superposition Operators in Semilinear Elliptic PDEs

Constantin Christof, Julia Kowalczyk

TL;DR

The paper addresses identifying a nonsmooth Nemytskii operator in a semilinear elliptic PDE by optimizing over the derivative $u=g_u'$ with a composite regularization that promotes sparsity and tractability. It develops a rigorous infinite-dimensional analysis, proving BV/regulated regularity of local minimizers, Hadamard differentiability of the control-to-state map, and Bouligand and primal-dual first-order optimality conditions, together with a gradient projection algorithm that converges in function space. The results reveal that locally optimal activation functions $g_{ar u}$ are sigmoidal with a kink at zero, resembling ReLU-like behavior, which is particularly relevant for learning-informed PDEs. Numerical experiments confirm the theoretical findings, demonstrate finite-time termination of the algorithm, and illustrate mesh-independent convergence and activation shapes consistent with the theory, highlighting practical implications for data-driven PDE identification and training with nonsmooth activations.

Abstract

We study an infinite-dimensional optimization problem that aims to identify the Nemytskii operator in the nonlinear part of a prototypical semilinear elliptic partial differential equation (PDE) which minimizes the distance between the PDE-solution and a given desired state. In contrast to previous works, we consider this identification problem in a low-regularity regime in which the function inducing the Nemytskii operator is a-priori only known to be an element of $H^1_{loc}(\mathbb{R})$. This makes the studied problem class a suitable point of departure for the rigorous analysis of training problems for learning-informed PDEs in which an unknown superposition operator is approximated by means of a neural network with nonsmooth activation functions (ReLU, leaky-ReLU, etc.). We establish that, despite the low regularity of the controls, it is possible to derive a classical stationarity system for local minimizers and to solve the considered problem by means of a gradient projection method. The convergence of the resulting algorithm is proven in the function space setting. It is also shown that the established first-order necessary optimality conditions imply that locally optimal superposition operators share various characteristic properties with commonly used activation functions: They are always sigmoidal, continuously differentiable away from the origin, and typically possess a distinct kink at zero. The paper concludes with numerical experiments which confirm the theoretical findings.

On the Identification and Optimization of Nonsmooth Superposition Operators in Semilinear Elliptic PDEs

TL;DR

The paper addresses identifying a nonsmooth Nemytskii operator in a semilinear elliptic PDE by optimizing over the derivative with a composite regularization that promotes sparsity and tractability. It develops a rigorous infinite-dimensional analysis, proving BV/regulated regularity of local minimizers, Hadamard differentiability of the control-to-state map, and Bouligand and primal-dual first-order optimality conditions, together with a gradient projection algorithm that converges in function space. The results reveal that locally optimal activation functions are sigmoidal with a kink at zero, resembling ReLU-like behavior, which is particularly relevant for learning-informed PDEs. Numerical experiments confirm the theoretical findings, demonstrate finite-time termination of the algorithm, and illustrate mesh-independent convergence and activation shapes consistent with the theory, highlighting practical implications for data-driven PDE identification and training with nonsmooth activations.

Abstract

We study an infinite-dimensional optimization problem that aims to identify the Nemytskii operator in the nonlinear part of a prototypical semilinear elliptic partial differential equation (PDE) which minimizes the distance between the PDE-solution and a given desired state. In contrast to previous works, we consider this identification problem in a low-regularity regime in which the function inducing the Nemytskii operator is a-priori only known to be an element of . This makes the studied problem class a suitable point of departure for the rigorous analysis of training problems for learning-informed PDEs in which an unknown superposition operator is approximated by means of a neural network with nonsmooth activation functions (ReLU, leaky-ReLU, etc.). We establish that, despite the low regularity of the controls, it is possible to derive a classical stationarity system for local minimizers and to solve the considered problem by means of a gradient projection method. The convergence of the resulting algorithm is proven in the function space setting. It is also shown that the established first-order necessary optimality conditions imply that locally optimal superposition operators share various characteristic properties with commonly used activation functions: They are always sigmoidal, continuously differentiable away from the origin, and typically possess a distinct kink at zero. The paper concludes with numerical experiments which confirm the theoretical findings.
Paper Structure (19 sections, 42 theorems, 163 equations, 4 figures, 3 tables)

This paper contains 19 sections, 42 theorems, 163 equations, 4 figures, 3 tables.

Key Result

Lemma 3.1

For all $u \in L^q(\mathbb{R})$, $1 \leq q \leq \infty$, it holds

Figures (4)

  • Figure 1: Value of the objective function (left) and the stationarity measure $\Theta_{\epsilon_1}(\mathpzc{u}_i)$ (right) as functions of the iteration counter $i$ of \ref{['alg:gradproj']} in the case of Example \ref{['ExampleA']} for different widths $h_\mathpzc{u}$ and $h_y$. The legend refers to both figures. It can be seen that the reduction of the objective value and stationarity measure stagnates at a threshold that depends on the discretization level. This reflects that, by discretizing the involved PDEs, one introduces an error and, thus, causes \ref{['alg:gradproj']} to run with inexact gradient and function value evaluations; see \ref{['subsec:ImplementationDetails']}.
  • Figure 2: Approximations of the optimal control $\bar{\mathpzc{u}}$ obtained from \ref{['alg:gradproj']} at the end of the calculations depicted in \ref{['fig:Ex1-1']} for different mesh widths $h_\mathpzc{u}$ and $h_y$ (left) and resulting optimal state $\bar{y}$ for $h_\mathpzc{u} = h_y = 1/128$ (right).
  • Figure 3: Approximations of the optimal control $\bar{\mathpzc{u}}$ (left) and the superposition function $g_{E_r(\bar{\mathpzc{u}})}$ (right) obtained from \ref{['alg:gradproj']} in the situation of Example \ref{['ExampleB']} for different values of $\nu_1$ and $h_y = h_\mathpzc{u} = 1/512$. The legend refers to both figures. It can be seen that the $L^1$-regularization promotes sparsity properties; cf. the stationarity system \ref{['eq:pdsysPr']}. For the highest value $\nu_1 = 1/128$, the optimal control is zero.
  • Figure 4: Depiction of the (essential) support of the optimal control $\bar{\mathpzc{u}}$ (left) and contributions of the tracking term and the $L^1(-r,r)$- and $L^2(-r,r)$-regularization terms to the final objective function value (right) in the situation of \ref{['fig:Ex2-1']}. It can be seen that the essential support of the optimal control $\bar{\mathpzc{u}}$ shrinks as $\nu_1$ increases. For the highest value $\nu_1 = 1/128$, it vanishes.

Theorems & Definitions (97)

  • Lemma 3.1: Hölder Continuity of $g_u$
  • proof
  • Definition 3.2: Important Constants
  • Lemma 3.3: Properties of $A_u$
  • proof
  • Theorem 3.4: Properties of the Control-to-State Mapping
  • proof
  • Lemma 3.5: $\|\cdot\|_\infty$-Bound
  • proof
  • Remark 3.6
  • ...and 87 more