On the Identification and Optimization of Nonsmooth Superposition Operators in Semilinear Elliptic PDEs
Constantin Christof, Julia Kowalczyk
TL;DR
The paper addresses identifying a nonsmooth Nemytskii operator in a semilinear elliptic PDE by optimizing over the derivative $u=g_u'$ with a composite regularization that promotes sparsity and tractability. It develops a rigorous infinite-dimensional analysis, proving BV/regulated regularity of local minimizers, Hadamard differentiability of the control-to-state map, and Bouligand and primal-dual first-order optimality conditions, together with a gradient projection algorithm that converges in function space. The results reveal that locally optimal activation functions $g_{ar u}$ are sigmoidal with a kink at zero, resembling ReLU-like behavior, which is particularly relevant for learning-informed PDEs. Numerical experiments confirm the theoretical findings, demonstrate finite-time termination of the algorithm, and illustrate mesh-independent convergence and activation shapes consistent with the theory, highlighting practical implications for data-driven PDE identification and training with nonsmooth activations.
Abstract
We study an infinite-dimensional optimization problem that aims to identify the Nemytskii operator in the nonlinear part of a prototypical semilinear elliptic partial differential equation (PDE) which minimizes the distance between the PDE-solution and a given desired state. In contrast to previous works, we consider this identification problem in a low-regularity regime in which the function inducing the Nemytskii operator is a-priori only known to be an element of $H^1_{loc}(\mathbb{R})$. This makes the studied problem class a suitable point of departure for the rigorous analysis of training problems for learning-informed PDEs in which an unknown superposition operator is approximated by means of a neural network with nonsmooth activation functions (ReLU, leaky-ReLU, etc.). We establish that, despite the low regularity of the controls, it is possible to derive a classical stationarity system for local minimizers and to solve the considered problem by means of a gradient projection method. The convergence of the resulting algorithm is proven in the function space setting. It is also shown that the established first-order necessary optimality conditions imply that locally optimal superposition operators share various characteristic properties with commonly used activation functions: They are always sigmoidal, continuously differentiable away from the origin, and typically possess a distinct kink at zero. The paper concludes with numerical experiments which confirm the theoretical findings.
