The Extrapolation Power of Implicit Models

Juliette Decugis; Alicia Y. Tsai; Max Emerling; Ashwin Ganesh; Laurent El Ghaoui

The Extrapolation Power of Implicit Models

Juliette Decugis, Alicia Y. Tsai, Max Emerling, Ashwin Ganesh, Laurent El Ghaoui

TL;DR

The paper studies the extrapolation capabilities of implicit models that define hidden states via equilibrium equations, enabling stable forward-backward information flow through closed-loop feedback. It compares implicit models against non-implicit architectures across mathematical, time-series, and geospatial tasks to assess robustness to unobserved data. The authors show that the equilibrium-based representations adaptively grow in depth through iterative refinement and leverage closed-loop feedback to achieve superior out-of-distribution extrapolation, with predictions given by $\hat{y} = Cx + Du$ where $x = \phi(Ax + Bu)$. These findings suggest implicit models can learn more general, task-agnostic representations with practical advantages for real-world extrapolation and limited data scenarios.

Abstract

In this paper, we investigate the extrapolation capabilities of implicit deep learning models in handling unobserved data, where traditional deep neural networks may falter. Implicit models, distinguished by their adaptability in layer depth and incorporation of feedback within their computational graph, are put to the test across various extrapolation scenarios: out-of-distribution, geographical, and temporal shifts. Our experiments consistently demonstrate significant performance advantage with implicit models. Unlike their non-implicit counterparts, which often rely on meticulous architectural design for each task, implicit models demonstrate the ability to learn complex model structures without the need for task-specific design, highlighting their robustness in handling unseen data.

The Extrapolation Power of Implicit Models

TL;DR

where

. These findings suggest implicit models can learn more general, task-agnostic representations with practical advantages for real-world extrapolation and limited data scenarios.

Abstract

Paper Structure (30 sections, 3 equations, 11 figures, 4 tables)

This paper contains 30 sections, 3 equations, 11 figures, 4 tables.

Introduction
Related Work.
Mathematical tasks.
Out-of-distribution generalization.
Function extrapolation.
Background
Problem Setup
ImplicitRNN.
Extrapolate on Mathematical Tasks
Identify function.
Arithmetic operations
Rolling functions.
Extrapolate on Noisy Real-world Data
Oscillating time series forecasting.
Earthquake location prediction.
...and 15 more sections

Figures (11)

Figure 1: Time series of a 21-day rolling average of AMC stock volatility plotted on a log scale, highlights a drastic volatility increase at the beginning of our validation cutoff.
Figure 2: Left: Geometric visualization of one set of training features ($x_i$, $y_i$, $z_i$, $p_i$, $\theta_i$) and its corresponding labels ($X$, $Y$, $Z$, $T$). The triangles correspond to stations and the star corresponds to a source. Right: The map shows the training set region colored in blue, roughly corresponding to the Pacific Ring of Fire. The two red areas are the testing set regions for $k = 3$.
Figure 3: Test MSE for the identity function task. MSE for MLP and Transformers model increases as the distribution shift hyper-parameter $\kappa$ increases.
Figure 4: Test Log(MSE) for the arithmetic operations. The implicit model strongly outperforms all other models on OOD data.
Figure 5: For rolling tasks, implicit models maintain close to constant loss ($\downarrow$) and accuracy ($\uparrow$) across shifts.
...and 6 more figures

The Extrapolation Power of Implicit Models

TL;DR

Abstract

The Extrapolation Power of Implicit Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)