On the Identification and Optimization of Nonsmooth Superposition Operators in Semilinear Elliptic PDEs

Constantin Christof; Julia Kowalczyk

On the Identification and Optimization of Nonsmooth Superposition Operators in Semilinear Elliptic PDEs

Constantin Christof, Julia Kowalczyk

TL;DR

The paper addresses identifying a nonsmooth Nemytskii operator in a semilinear elliptic PDE by optimizing over the derivative $u=g_u'$ with a composite regularization that promotes sparsity and tractability. It develops a rigorous infinite-dimensional analysis, proving BV/regulated regularity of local minimizers, Hadamard differentiability of the control-to-state map, and Bouligand and primal-dual first-order optimality conditions, together with a gradient projection algorithm that converges in function space. The results reveal that locally optimal activation functions $g_{ar u}$ are sigmoidal with a kink at zero, resembling ReLU-like behavior, which is particularly relevant for learning-informed PDEs. Numerical experiments confirm the theoretical findings, demonstrate finite-time termination of the algorithm, and illustrate mesh-independent convergence and activation shapes consistent with the theory, highlighting practical implications for data-driven PDE identification and training with nonsmooth activations.

Abstract

We study an infinite-dimensional optimization problem that aims to identify the Nemytskii operator in the nonlinear part of a prototypical semilinear elliptic partial differential equation (PDE) which minimizes the distance between the PDE-solution and a given desired state. In contrast to previous works, we consider this identification problem in a low-regularity regime in which the function inducing the Nemytskii operator is a-priori only known to be an element of $H^1_{loc}(\mathbb{R})$. This makes the studied problem class a suitable point of departure for the rigorous analysis of training problems for learning-informed PDEs in which an unknown superposition operator is approximated by means of a neural network with nonsmooth activation functions (ReLU, leaky-ReLU, etc.). We establish that, despite the low regularity of the controls, it is possible to derive a classical stationarity system for local minimizers and to solve the considered problem by means of a gradient projection method. The convergence of the resulting algorithm is proven in the function space setting. It is also shown that the established first-order necessary optimality conditions imply that locally optimal superposition operators share various characteristic properties with commonly used activation functions: They are always sigmoidal, continuously differentiable away from the origin, and typically possess a distinct kink at zero. The paper concludes with numerical experiments which confirm the theoretical findings.

On the Identification and Optimization of Nonsmooth Superposition Operators in Semilinear Elliptic PDEs

TL;DR

The paper addresses identifying a nonsmooth Nemytskii operator in a semilinear elliptic PDE by optimizing over the derivative

with a composite regularization that promotes sparsity and tractability. It develops a rigorous infinite-dimensional analysis, proving BV/regulated regularity of local minimizers, Hadamard differentiability of the control-to-state map, and Bouligand and primal-dual first-order optimality conditions, together with a gradient projection algorithm that converges in function space. The results reveal that locally optimal activation functions

are sigmoidal with a kink at zero, resembling ReLU-like behavior, which is particularly relevant for learning-informed PDEs. Numerical experiments confirm the theoretical findings, demonstrate finite-time termination of the algorithm, and illustrate mesh-independent convergence and activation shapes consistent with the theory, highlighting practical implications for data-driven PDE identification and training with nonsmooth activations.

Abstract

. This makes the studied problem class a suitable point of departure for the rigorous analysis of training problems for learning-informed PDEs in which an unknown superposition operator is approximated by means of a neural network with nonsmooth activation functions (ReLU, leaky-ReLU, etc.). We establish that, despite the low regularity of the controls, it is possible to derive a classical stationarity system for local minimizers and to solve the considered problem by means of a gradient projection method. The convergence of the resulting algorithm is proven in the function space setting. It is also shown that the established first-order necessary optimality conditions imply that locally optimal superposition operators share various characteristic properties with commonly used activation functions: They are always sigmoidal, continuously differentiable away from the origin, and typically possess a distinct kink at zero. The paper concludes with numerical experiments which confirm the theoretical findings.

Paper Structure (19 sections, 42 theorems, 163 equations, 4 figures, 3 tables)

This paper contains 19 sections, 42 theorems, 163 equations, 4 figures, 3 tables.

Introduction
Motivation, Background, and Relation to Prior Work
Summary of Main Results
Structure of the Remainder of the Paper
Problem Setting and Notation
Basic Notation
The Problem Under Consideration
Remarks on the Problem Setting and the Choice of the Regularization Term
Basic Properties of the Control-to-State Mapping
First Consequences for the Optimization Problem
Regularity of Optimal Controls
Differentiability Properties of the Control-to-State Map
First-Order Optimality Conditions of Bouligand and Primal-Dual Type
A Gradient Projection Algorithm and its Convergence Analysis
Numerical Experiments
...and 4 more sections

Key Result

Lemma 3.1

For all $u \in L^q(\mathbb{R})$, $1 \leq q \leq \infty$, it holds

Figures (4)

Figure 1: Value of the objective function (left) and the stationarity measure $\Theta_{\epsilon_1}(\mathpzc{u}_i)$ (right) as functions of the iteration counter $i$ of \ref{['alg:gradproj']} in the case of Example \ref{['ExampleA']} for different widths $h_\mathpzc{u}$ and $h_y$. The legend refers to both figures. It can be seen that the reduction of the objective value and stationarity measure stagnates at a threshold that depends on the discretization level. This reflects that, by discretizing the involved PDEs, one introduces an error and, thus, causes \ref{['alg:gradproj']} to run with inexact gradient and function value evaluations; see \ref{['subsec:ImplementationDetails']}.
Figure 2: Approximations of the optimal control $\bar{\mathpzc{u}}$ obtained from \ref{['alg:gradproj']} at the end of the calculations depicted in \ref{['fig:Ex1-1']} for different mesh widths $h_\mathpzc{u}$ and $h_y$ (left) and resulting optimal state $\bar{y}$ for $h_\mathpzc{u} = h_y = 1/128$ (right).
Figure 3: Approximations of the optimal control $\bar{\mathpzc{u}}$ (left) and the superposition function $g_{E_r(\bar{\mathpzc{u}})}$ (right) obtained from \ref{['alg:gradproj']} in the situation of Example \ref{['ExampleB']} for different values of $\nu_1$ and $h_y = h_\mathpzc{u} = 1/512$. The legend refers to both figures. It can be seen that the $L^1$-regularization promotes sparsity properties; cf. the stationarity system \ref{['eq:pdsysPr']}. For the highest value $\nu_1 = 1/128$, the optimal control is zero.
Figure 4: Depiction of the (essential) support of the optimal control $\bar{\mathpzc{u}}$ (left) and contributions of the tracking term and the $L^1(-r,r)$- and $L^2(-r,r)$-regularization terms to the final objective function value (right) in the situation of \ref{['fig:Ex2-1']}. It can be seen that the essential support of the optimal control $\bar{\mathpzc{u}}$ shrinks as $\nu_1$ increases. For the highest value $\nu_1 = 1/128$, it vanishes.

Theorems & Definitions (97)

Lemma 3.1: Hölder Continuity of $g_u$
proof
Definition 3.2: Important Constants
Lemma 3.3: Properties of $A_u$
proof
Theorem 3.4: Properties of the Control-to-State Mapping
proof
Lemma 3.5: $\|\cdot\|_\infty$-Bound
proof
Remark 3.6
...and 87 more

On the Identification and Optimization of Nonsmooth Superposition Operators in Semilinear Elliptic PDEs

TL;DR

Abstract

On the Identification and Optimization of Nonsmooth Superposition Operators in Semilinear Elliptic PDEs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (97)