An Analysis of Linear Time Series Forecasting Models

William Toner; Luke Darlow

An Analysis of Linear Time Series Forecasting Models

William Toner, Luke Darlow

TL;DR

The paper investigates linear time-series forecasting models and demonstrates that popular variants (e.g., DLinear, FITS, NLinear, RLinear) are functionally equivalent to unconstrained linear regression when viewed through augmented feature spaces or invertible normalisations. Leveraging the convexity of the $L_2$ loss, the authors show these models share a single optimum and admit closed-form $OLS$ solutions, often outperforming SGD-trained counterparts. Empirical results across eight datasets reveal that learned weight matrices and forecasts converge toward the $OLS$ solution, with FITS exhibiting unique bias characteristics due to its Fourier-based parameterisation. The findings suggest that, in many forecasting scenarios, simple linear models are competitive or superior to more complex architectures, underscoring the value of closed-form solutions for fast, reliable forecasting.

Abstract

Despite their simplicity, linear models perform well at time series forecasting, even when pitted against deeper and more expensive models. A number of variations to the linear model have been proposed, often including some form of feature normalisation that improves model generalisation. In this paper we analyse the sets of functions expressible using these linear model architectures. In so doing we show that several popular variants of linear models for time series forecasting are equivalent and functionally indistinguishable from standard, unconstrained linear regression. We characterise the model classes for each linear variant. We demonstrate that each model can be reinterpreted as unconstrained linear regression over a suitably augmented feature set, and therefore admit closed-form solutions when using a mean-squared loss function. We provide experimental evidence that the models under inspection learn nearly identical solutions, and finally demonstrate that the simpler closed form solutions are superior forecasters across 72% of test settings.

An Analysis of Linear Time Series Forecasting Models

TL;DR

loss, the authors show these models share a single optimum and admit closed-form

solutions, often outperforming SGD-trained counterparts. Empirical results across eight datasets reveal that learned weight matrices and forecasts converge toward the

solution, with FITS exhibiting unique bias characteristics due to its Fourier-based parameterisation. The findings suggest that, in many forecasting scenarios, simple linear models are competitive or superior to more complex architectures, underscoring the value of closed-form solutions for fast, reliable forecasting.

Abstract

Paper Structure (38 sections, 9 theorems, 48 equations, 5 figures, 2 tables)

This paper contains 38 sections, 9 theorems, 48 equations, 5 figures, 2 tables.

Introduction
Linear models for forecasting
Outline and Contributions
Related Work
Analysis of Linear Time Series Forecasting Models
Notation
DLinear
FITS
Invertible Data Normalisations
Instance Norm
Reversible Instance Normalisation
RLinear
NowNorm
NLinear
Discussion
...and 23 more sections

Key Result

Lemma 3.2

Let $M(\text{DLinear})$ denote the DLinear model class, i.e. the set of functions $f:\mathbb{R}^L\rightarrow\mathbb{R}^T$ which can be represented as a DLinear model. $M(\text{DLinear})$ is precisely equal to the space of affine linear functions. That is, all functions of the form $Ax + \vec{b}$ may

Figures (5)

Figure 1: This figure displays the cropped weight matrices after 50 epochs of training for all four models with instance normalization, juxtaposed with their corresponding closed-form solution (extreme left). These show how similar the underlying models are. There are slight differences that affect forecasts to a marginal degree (see Figure \ref{['fig:forecasts']}).
Figure 2: A demonstration of how the model's weight matrices tend to the OLS solution during training. This is a visualised as the cosine similarity between a given model's weight matrix and that determined by the closed form solution.
Figure 3: Forecast comparison on ETTh1 with $T=336$, comparing the 5 models that use instance normalisation.
Figure 4: Comparison of the learned bias parameters for several linear models implementing feature normalisation technique. FITS results clearly in a different bias term.
Figure 5: The biases learned by the FITS, Linear, DLinear after being trained on ETTh1 for 50 epochs. We also include the bias learned by the closed-form OLS linear regression. We note that, in line with theory from Section \ref{['sec:model_analysis']}, we get the same bias for the DLinear, OLS and Linear models. Notably the bias for FITS is substantially different. This is explained by the choice of normalisation used in the Fourier transform in FITS.

Theorems & Definitions (27)

Definition 3.1: Forecast Model and Model Class
Lemma 3.2: DLinear Model Class
proof
Theorem 3.3: FITS Model Class
Definition 3.4: Instance normalisation
Lemma 3.5: Linear+IN
proof
Definition 3.6: Reversible Instance Normalisation
Lemma 3.7: RLinear
proof
...and 17 more

An Analysis of Linear Time Series Forecasting Models

TL;DR

Abstract

An Analysis of Linear Time Series Forecasting Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (27)