Be Wary of Your Time Series Preprocessing

Sofiane Ennadir; Tianze Wang; Oleg Smirnov; Sahar Asadi; Lele Cao

Be Wary of Your Time Series Preprocessing

Sofiane Ennadir, Tianze Wang, Oleg Smirnov, Sahar Asadi, Lele Cao

TL;DR

This work presents the first formal analysis of how different normalization strategies, specifically instance-based and global scaling, impact the expressivity of Transformer-based architectures for time series representation learning, and proposes a novel expressivity framework tailored to time series.

Abstract

Normalization and scaling are fundamental preprocessing steps in time series modeling, yet their role in Transformer-based models remains underexplored from a theoretical perspective. In this work, we present the first formal analysis of how different normalization strategies, specifically instance-based and global scaling, impact the expressivity of Transformer-based architectures for time series representation learning. We propose a novel expressivity framework tailored to time series, which quantifies a model's ability to distinguish between similar and dissimilar inputs in the representation space. Using this framework, we derive theoretical bounds for two widely used normalization methods: Standard and Min-Max scaling. Our analysis reveals that the choice of normalization strategy can significantly influence the model's representational capacity, depending on the task and data characteristics. We complement our theory with empirical validation on classification and forecasting benchmarks using multiple Transformer-based models. Our results show that no single normalization method consistently outperforms others, and in some cases, omitting normalization entirely leads to superior performance. These findings highlight the critical role of preprocessing in time series learning and motivate the need for more principled normalization strategies tailored to specific tasks and datasets.

Be Wary of Your Time Series Preprocessing

TL;DR

Abstract

Paper Structure (16 sections, 4 theorems, 23 equations, 2 figures, 4 tables)

This paper contains 16 sections, 4 theorems, 23 equations, 2 figures, 4 tables.

Introduction
Related Work
Preliminaries
Expressivity of a Time Series Transformer
Expressivity of Transformer-Based Models
Problem Setup.
On the Effect of Normalization
Experimental Validation
Experimental Setup
Experimental Results — Classification
Experimental Results — Forecasting
Conclusion
Proof Of Theorem \ref{['theo:standard_normalization']}
Proof Of Theorem \ref{['theo:minmax_normalization']}
Additional Results
...and 1 more sections

Key Result

Theorem 1

Let $f \colon \mathcal{X} \subseteq \mathbb{R}^{n \times d} \rightarrow \mathcal{Y} \subseteq \mathbb{R}^d$ be a Transformer-based model (TBM) under the setting described in our problem formulation. Then:

Figures (2)

Figure 1: Resulting Accuracy Comparison of the different considered pre-processing methods for different models and datasets for the classification task. "None" corresponds to the case where the time series is used without normalization.
Figure 2: Resulting MAE of the different considered pre-processing methods for different models and datasets for the forecasting task. "None" corresponds to the case where the time series is used without normalization.

Theorems & Definitions (7)

Definition 1
Theorem 1
Theorem 2
Theorem 3
proof
Theorem 4
proof

Be Wary of Your Time Series Preprocessing

TL;DR

Abstract

Be Wary of Your Time Series Preprocessing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (7)