Quadratic neural networks for solving inverse problems

Leon Frischauf; Otmar Scherzer; Cong Shi

Quadratic neural networks for solving inverse problems

Leon Frischauf, Otmar Scherzer, Cong Shi

TL;DR

This work addresses the inverse problem $F:{\bf X}\to{\bf Y}$ by exploring neural-network ansatz functions with generalized, higher-order decision functions, focusing on shallow networks that include quadratic and radial forms. The authors establish universal approximation results for SBQNNs and CUNNs and derive $\mathcal{L}^1$-convergence rates for radial quadratic networks (RQNNs) using wavelet-frame constructions and approximation-to-identity (AtI) theory, yielding explicit $L^2$-error bounds that scale as $(N+1)^{-1/2}$ with the number of terms. They show that Gauss–Newton convergence is tractable for RQNNs under suitable nondegeneracy conditions, and they argue that higher-order, shallower architectures can offer clearer, more tractable convergence analyses than deep affine networks. The results suggest that quadratic or radial higher-order networks can achieve comparable approximation quality with fewer components and allow more transparent analysis for ill-posed inverse problems, with implications for practical reconstruction tasks and training dynamics.

Abstract

In this paper we investigate the solution of inverse problems with neural network ansatz functions with generalized decision functions. The relevant observation for this work is that such functions can approximate typical test cases, such as the Shepp-Logan phantom, better, than standard neural networks. Moreover, we show that the convergence analysis of numerical methods for solving inverse problems with shallow generalized neural network functions leads to more intuitive convergence conditions, than for deep affine linear neural networks.

Quadratic neural networks for solving inverse problems

TL;DR

This work addresses the inverse problem

by exploring neural-network ansatz functions with generalized, higher-order decision functions, focusing on shallow networks that include quadratic and radial forms. The authors establish universal approximation results for SBQNNs and CUNNs and derive

-convergence rates for radial quadratic networks (RQNNs) using wavelet-frame constructions and approximation-to-identity (AtI) theory, yielding explicit

-error bounds that scale as

with the number of terms. They show that Gauss–Newton convergence is tractable for RQNNs under suitable nondegeneracy conditions, and they argue that higher-order, shallower architectures can offer clearer, more tractable convergence analyses than deep affine networks. The results suggest that quadratic or radial higher-order networks can achieve comparable approximation quality with fewer components and allow more transparent analysis for ill-posed inverse problems, with implications for practical reconstruction tasks and training dynamics.

Abstract

Paper Structure (8 sections, 11 theorems, 55 equations)

This paper contains 8 sections, 11 theorems, 55 equations.

Introduction
Examples of networks with generalized decision functions
Motivation
Motivation 1: The Shepp-Logan phantom
Motivation 2: The Gauss-Newton iteration
Convergence rates for universal approximation of RQNNs
Conclusion
Approximation to the identity (AtI)

Key Result

Theorem 1

Let $\sigma:\mathbb{R} \to \mathbb{R}$ be a continuous discriminatory function. Then, for every function $g \in C([0,1]^n)$ and every $\epsilon>0$, there exists a function satisfying

Theorems & Definitions (32)

Definition 1: Affine linear neural network functions
Definition 2: Shallow generalized neural network function
Definition 3: Neural networks with generalized decision functions
Remark 1
Definition 4: Discriminatory function
Example 1
Theorem 1: Cyb89
Theorem 2: Generalized universal approximation theorem
Proof
Corollary 1: Universal approximation properties of SBQNNs and CUNNs
...and 22 more

Quadratic neural networks for solving inverse problems

TL;DR

Abstract

Quadratic neural networks for solving inverse problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (32)