Table of Contents
Fetching ...

Implicit Models: Expressive Power Scales with Test-Time Compute

Jialin Liu, Lisang Ding, Stanley Osher, Wotao Yin

TL;DR

The paper provides a nonparametric, function-space characterization showing that the expressive power of implicit equilibrium models grows with test-time compute, culminating in the ability to represent locally Lipschitz maps through fixed-point iterations of a simple regular operator. It proves sufficiency and necessity results, establishing that regular implicit operators can match and even exceed the expressive scope of explicit models without increasing learned parameters. Four diverse case studies—image reconstruction, Navier–Stokes, linear programming via GNNs, and LLM reasoning—empirically validate the theory, demonstrating increasing per-iteration complexity and improved solutions as iterations grow. The findings illuminate why compact implicit models can rival much larger explicit networks and offer guidance for leveraging domain priors and adaptive contraction to balance stability and expressivity.

Abstract

Implicit models, an emerging model class, compute outputs by iterating a single parameter block to a fixed point. This architecture realizes an infinite-depth, weight-tied network that trains with constant memory, significantly reducing memory needs for the same level of performance compared to explicit models. While it is empirically known that these compact models can often match or even exceed the accuracy of larger explicit networks by allocating more test-time compute, the underlying mechanism remains poorly understood. We study this gap through a nonparametric analysis of expressive power. We provide a strict mathematical characterization, showing that a simple and regular implicit operator can, through iteration, progressively express more complex mappings. We prove that for a broad class of implicit models, this process lets the model's expressive power scale with test-time compute, ultimately matching a much richer function class. The theory is validated across four domains: image reconstruction, scientific computing, operations research, and LLM reasoning, demonstrating that as test-time iterations increase, the complexity of the learned mapping rises, while the solution quality simultaneously improves and stabilizes.

Implicit Models: Expressive Power Scales with Test-Time Compute

TL;DR

The paper provides a nonparametric, function-space characterization showing that the expressive power of implicit equilibrium models grows with test-time compute, culminating in the ability to represent locally Lipschitz maps through fixed-point iterations of a simple regular operator. It proves sufficiency and necessity results, establishing that regular implicit operators can match and even exceed the expressive scope of explicit models without increasing learned parameters. Four diverse case studies—image reconstruction, Navier–Stokes, linear programming via GNNs, and LLM reasoning—empirically validate the theory, demonstrating increasing per-iteration complexity and improved solutions as iterations grow. The findings illuminate why compact implicit models can rival much larger explicit networks and offer guidance for leveraging domain priors and adaptive contraction to balance stability and expressivity.

Abstract

Implicit models, an emerging model class, compute outputs by iterating a single parameter block to a fixed point. This architecture realizes an infinite-depth, weight-tied network that trains with constant memory, significantly reducing memory needs for the same level of performance compared to explicit models. While it is empirically known that these compact models can often match or even exceed the accuracy of larger explicit networks by allocating more test-time compute, the underlying mechanism remains poorly understood. We study this gap through a nonparametric analysis of expressive power. We provide a strict mathematical characterization, showing that a simple and regular implicit operator can, through iteration, progressively express more complex mappings. We prove that for a broad class of implicit models, this process lets the model's expressive power scale with test-time compute, ultimately matching a much richer function class. The theory is validated across four domains: image reconstruction, scientific computing, operations research, and LLM reasoning, demonstrating that as test-time iterations increase, the complexity of the learned mapping rises, while the solution quality simultaneously improves and stabilizes.

Paper Structure

This paper contains 23 sections, 19 theorems, 223 equations, 9 figures, 5 tables.

Key Result

Theorem 2.4

Under Assumption asmp:f, for any ${\mathcal{F}}$ there exists a regular implicit operator ${\mathcal{G}}:{\mathbb{R}}^n\times{\mathbb{X}}\to{\mathbb{R}}^n$ whose fixed-point map reproduces ${\mathcal{F}}$: $\mathrm{Fix}({\mathcal{G}}(\cdot,{\bm{x}}))\;=\;{\mathcal{F}}({\bm{x}})$ for all ${\bm{x}}\in

Figures (9)

  • Figure 1: (Conceptual diagram) A simple implicit update expresses a complex map via iteration.
  • Figure 2: Validation on image deblurring. Iterating a simple operator ${\mathcal{G}}_{\Theta}$ produces a complex fixed-point mapping: Lipschitz (a) grows, while accuracy (b) improves and stabilizes.
  • Figure 3: Visual results for deblurring. The top PSNR values (28.49, 30.03, or 31.53 dB) correspond to the single visualized image; the second line shows the average ($\pm$ std) over all test samples.
  • Figure 4: Validation on the steady Navier--Stokes task. Iterating a simple operator ${\mathcal{G}}_{\Theta}$ yields a complex fixed-point mapping: Lipschitz constant (a) increases, while error (b) decreases.
  • Figure 5: Visual results for NS equations. The top value (0.260 and 0.112) means the relative error (between the prediction and the ground truth) on the single visualized sample; the second line shows the average relative error ($\pm$ std) over all test samples. Both models have 2.376 M parameters.
  • ...and 4 more figures

Theorems & Definitions (38)

  • Definition 2.1: Lipschitz continuity
  • Definition 2.3: Regular implicit operator
  • Theorem 2.4: Sufficiency
  • Theorem 2.5: Necessity
  • Definition 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Corollary 3.5
  • Theorem 3.6: temam1995navier
  • Corollary 3.7
  • ...and 28 more