Table of Contents
Fetching ...

The Unreasonable Effectiveness of Solving Inverse Problems with Neural Networks

Philipp Holl, Nils Thuerey

TL;DR

This work shows that neural networks trained to learn solutions to inverse problems can find better solutions than classical optimizers even on their training set, and suggests an alternative use for neural networks: rather than generalizing to new data for fast inference, they can also be used to find better solutions on known data.

Abstract

Finding model parameters from data is an essential task in science and engineering, from weather and climate forecasts to plasma control. Previous works have employed neural networks to greatly accelerate finding solutions to inverse problems. Of particular interest are end-to-end models which utilize differentiable simulations in order to backpropagate feedback from the simulated process to the network weights and enable roll-out of multiple time steps. So far, it has been assumed that, while model inference is faster than classical optimization, this comes at the cost of a decrease in solution accuracy. We show that this is generally not true. In fact, neural networks trained to learn solutions to inverse problems can find better solutions than classical optimizers even on their training set. To demonstrate this, we perform both a theoretical analysis as well an extensive empirical evaluation on challenging problems involving local minima, chaos, and zero-gradient regions. Our findings suggest an alternative use for neural networks: rather than generalizing to new data for fast inference, they can also be used to find better solutions on known data.

The Unreasonable Effectiveness of Solving Inverse Problems with Neural Networks

TL;DR

This work shows that neural networks trained to learn solutions to inverse problems can find better solutions than classical optimizers even on their training set, and suggests an alternative use for neural networks: rather than generalizing to new data for fast inference, they can also be used to find better solutions on known data.

Abstract

Finding model parameters from data is an essential task in science and engineering, from weather and climate forecasts to plasma control. Previous works have employed neural networks to greatly accelerate finding solutions to inverse problems. Of particular interest are end-to-end models which utilize differentiable simulations in order to backpropagate feedback from the simulated process to the network weights and enable roll-out of multiple time steps. So far, it has been assumed that, while model inference is faster than classical optimization, this comes at the cost of a decrease in solution accuracy. We show that this is generally not true. In fact, neural networks trained to learn solutions to inverse problems can find better solutions than classical optimizers even on their training set. To demonstrate this, we perform both a theoretical analysis as well an extensive empirical evaluation on challenging problems involving local minima, chaos, and zero-gradient regions. Our findings suggest an alternative use for neural networks: rather than generalizing to new data for fast inference, they can also be used to find better solutions on known data.
Paper Structure (49 sections, 31 equations, 25 figures, 5 tables)

This paper contains 49 sections, 31 equations, 25 figures, 5 tables.

Figures (25)

  • Figure 1: Parameterized optimization. The network predicts solutions $x_i$ based on targets $y_i^*$ and identifiers $\gamma_i$. The loss $\mathcal{L}$ is defined in the output space of $F(x)$.
  • Figure 2: Initial gradient alignment with JPO. Measured average over 1000 seeds (blue) and theory curve (pink). (a) Noise scaling when examples have the same ground truth $x^*$. (b) Noise-free network training with $x^*_i = 10 \gamma_i$. (c,d) Sum-of-losses and voting methods on the problem of fitting $\sin(2x)$ in the presence of noise. Noise-only theory curve in light pink.
  • Figure 3: Wave packet localization. (a) Example waveform $u(t)$, (b) Loss and gradient landscape for $t_0$, (c) training / optimization curves, (d) Fraction of inverse problems for which JPO and gradient descent yield better solutions than BFGS.
  • Figure 4: Billiards experiment. (a) Example loss landscape, (b) Fraction of inverse problems for which JPO and gradient descent yield better solutions than BFGS.
  • Figure 5: Kuramoto–Sivashinsky experiment. (a) Example trajectory, (b) loss landscape for $\beta$, (c) example optimization curves, (d) fraction of inverse problems for which JPO and gradient descent yield better solutions than BFGS.
  • ...and 20 more figures

Theorems & Definitions (4)

  • proof
  • proof
  • proof
  • proof