Table of Contents
Fetching ...

Why Cannot Neural Networks Master Extrapolation? Insights from Physical Laws

Ramzi Dakhmouche, Hossein Gorji

TL;DR

This paper analyzes why neural networks struggle with extrapolation beyond training data, arguing that a key factor is limited structural variability in the functions these models can represent. It introduces a framework based on polynomial differential equations to measure variability and uses differential annihilators to characterize neural networks' restrictive extrapolation capabilities. A theoretical result shows that networks with common activations have a minimal ODE of fixed degree and tend to constant equilibria outside the training domain, limiting their global behavior. Empirically, a minimal architectural change that increases structural variability yields better extrapolation on both synthetic functions and real-world ETTH time-series, pointing toward hybrid symbolic-neural approaches as a promising path to mastering extrapolation.

Abstract

Motivated by the remarkable success of Foundation Models (FMs) in language modeling, there has been growing interest in developing FMs for time series prediction, given the transformative power such models hold for science and engineering. This culminated in significant success of FMs in short-range forecasting settings. However, extrapolation or long-range forecasting remains elusive for FMs, which struggle to outperform even simple baselines. This contrasts with physical laws which have strong extrapolation properties, and raises the question of the fundamental difference between the structure of neural networks and physical laws. In this work, we identify and formalize a fundamental property characterizing the ability of statistical learning models to predict more accurately outside of their training domain, hence explaining performance deterioration for deep learning models in extrapolation settings. In addition to a theoretical analysis, we present empirical results showcasing the implications of this property on current deep learning architectures. Our results not only clarify the root causes of the extrapolation gap but also suggest directions for designing next-generation forecasting models capable of mastering extrapolation.

Why Cannot Neural Networks Master Extrapolation? Insights from Physical Laws

TL;DR

This paper analyzes why neural networks struggle with extrapolation beyond training data, arguing that a key factor is limited structural variability in the functions these models can represent. It introduces a framework based on polynomial differential equations to measure variability and uses differential annihilators to characterize neural networks' restrictive extrapolation capabilities. A theoretical result shows that networks with common activations have a minimal ODE of fixed degree and tend to constant equilibria outside the training domain, limiting their global behavior. Empirically, a minimal architectural change that increases structural variability yields better extrapolation on both synthetic functions and real-world ETTH time-series, pointing toward hybrid symbolic-neural approaches as a promising path to mastering extrapolation.

Abstract

Motivated by the remarkable success of Foundation Models (FMs) in language modeling, there has been growing interest in developing FMs for time series prediction, given the transformative power such models hold for science and engineering. This culminated in significant success of FMs in short-range forecasting settings. However, extrapolation or long-range forecasting remains elusive for FMs, which struggle to outperform even simple baselines. This contrasts with physical laws which have strong extrapolation properties, and raises the question of the fundamental difference between the structure of neural networks and physical laws. In this work, we identify and formalize a fundamental property characterizing the ability of statistical learning models to predict more accurately outside of their training domain, hence explaining performance deterioration for deep learning models in extrapolation settings. In addition to a theoretical analysis, we present empirical results showcasing the implications of this property on current deep learning architectures. Our results not only clarify the root causes of the extrapolation gap but also suggest directions for designing next-generation forecasting models capable of mastering extrapolation.

Paper Structure

This paper contains 17 sections, 37 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Information levels illustration. The green curve oscillates twice as much as the orange one, while they both have shrinking modes requiring more bits to be encoded.
  • Figure 2: Structural variation illustration. The cosine function is just a shifted version of sine.
  • Figure 3: Predicted trajectory- Standard MLP
  • Figure 4: Predicted trajectory - Proposed MLP
  • Figure 5: Structural variability illustration

Theorems & Definitions (1)

  • proof