Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning

Yuxuan Bian; Xuan Ju; Jiangtong Li; Zhijian Xu; Dawei Cheng; Qiang Xu

Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning

Yuxuan Bian, Xuan Ju, Jiangtong Li, Zhijian Xu, Dawei Cheng, Qiang Xu

TL;DR

An innovative framework that adapts Large Language Models for time-series representation learning with a distinctive element of the patch-wise decoding layer, which departs from previous methods reliant on sequence-level decoding.

Abstract

In this study, we present aLLM4TS, an innovative framework that adapts Large Language Models (LLMs) for time-series representation learning. Central to our approach is that we reconceive time-series forecasting as a self-supervised, multi-patch prediction task, which, compared to traditional contrastive learning or mask-and-reconstruction methods, captures temporal dynamics in patch representations more effectively. Our strategy encompasses two-stage training: (i). a causal continual pre-training phase on various time-series datasets, anchored on next patch prediction, effectively syncing LLM capabilities with the intricacies of time-series data; (ii). fine-tuning for multi-patch prediction in the targeted time-series context. A distinctive element of our framework is the patch-wise decoding layer, which departs from previous methods reliant on sequence-level decoding. Such a design directly transposes individual patches into temporal sequences, thereby significantly bolstering the model's proficiency in mastering temporal patch-based representations. aLLM4TS demonstrates superior performance in several downstream tasks, proving its effectiveness in deriving temporal representations with enhanced transferability and marking a pivotal advancement in the adaptation of LLMs for time-series analysis.

Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning

TL;DR

Abstract

Paper Structure (41 sections, 4 equations, 8 figures, 22 tables)

This paper contains 41 sections, 4 equations, 8 figures, 22 tables.

INTRODUCTION
RELATED WORK
Time Series Representation Learning
Time Series Analysis based on LLMs
PRELIMINARIES AND MOTIVATION
Preliminaries
Motivation
METHOD
Casual Next-patch Continual Pre-Training
Multi-patch Prediction Fine-tuning
EXPERIMENTS
Experimental Settings
Long-Term Time Series Forecasting
Short-Term Time Series Forecasting
Few-shot Time Series Forecasting
...and 26 more sections

Figures (8)

Figure 1: Pipeline Comparison. Given a time series embedding/patch sequence ${\bm{x}} \in \mathbb{R}^{L \times D}, D \gg P$ where $P$ is the patch size and forecasting horizon $H$: Non-Patch Based Models ❶ or Patch Based Models ❷ map it to the target sequence using a huge sequence-level linear layer $\mathbf{W}_{s} \in \mathbb{R}^{(L \cdot D) \times H}$; Our Patch-based Parallel Decoding aLLM4TS ❸ decodes each patch to the time domain using a small shared patch-level linear layer $\mathbf{W}_{p} \in \mathbb{R}^{D \times P}$ without modeling temporal relationships among patches. Specifically, the parameters of our patch-based decoding layer are only $\frac{P}{L * H}$ (e.g., $0.34\%$, when $P=16, L=64, H=720$), compared to the sequence-based decoding layer. Learning Paradigms. Instead of contrastive learning, masking reconstruction, and limited fine-tuning of the LLMs ①, we adopt a forecasting-based two-stage pre-training task ② to better transfer the sequence modeling capabilities within LLMs to time series.
Figure 2: The model framework of aLLM4TS. In stage 1, Casual Next-patch Pre-training (a), time series from different datasets are initially converted into univariate patch sequences. Then, we conduct next-patch prediction training with casual attention, effectively syncing LLM capabilities with the intricacies of time-series data. In stage 2, Multi-patch Prediction Fine-tuning (b), we fine-tune a few layers for multi-patch prediction in the target time-series context. Firstly, non-parametric methods are first employed to obtain the initial anchor representation of the horizon. Next, we concatenate the look-back window patches and anchors and feed them into the time-series-aligned LLM after stage 1 training with a position-aware attention mask, optimizing anchors with history patches. Finally, all optimized horizon anchors are independently decoded into the target temporal domain through a shared patch-wise linear decoder.
Figure 3: Interpretability study in Traffic dataset. Due to the page limit, we put the full visualization and analysis in the appendix. The Y-axis and X-axis represent prediction horizon patch indexes and look-back window patch indexes, respectively.
Figure 4: Long-term forecasting cases from ETTh1 by different models under the input-96-predict-96 settings. Blue lines are the ground truths and orange lines are the model predictions.
Figure 5: Short-term forecasting from the M4 hourly dataset by different models under the input-96-predict-48 settings. Blue lines are the ground truths and orange lines are the model predictions.
...and 3 more figures

Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning

TL;DR

Abstract

Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)