Table of Contents
Fetching ...

Tokenizing Stock Prices for Enhanced Multi-Step Forecast and Prediction

Zhuohang Zhu, Haodong Chen, Qiang Qu, Xiaoming Chen, Vera Chung

TL;DR

To address multi-step stock price forecasting and prediction, the authors introduce PCIE, a Patched Channel Integration Encoder that tokenizes multi-channel stock data through univariate patching and adaptive temporal learning, enabling a channel-mixing self-attention encoder to capture inter-series dependencies. They simultaneously perform a simple data preprocessing step that provides both price levels $p$ and changes $\Delta p$, improving input representations. Direct multi-step forecasting is used to reduce cumulative errors compared with iterative approaches. Empirical results on US_71 and US_14L datasets across horizons demonstrate state-of-the-art performance for both forecast and prediction, with ablation confirming the critical role of tokenization and data augmentation.

Abstract

Effective stock price forecasting (estimating future prices) and prediction (estimating future price changes) are pivotal for investors, regulatory agencies, and policymakers. These tasks enable informed decision-making, risk management, strategic planning, and superior portfolio returns. Despite their importance, forecasting and prediction are challenging due to the dynamic nature of stock price data, which exhibit significant temporal variations in distribution and statistical properties. Additionally, while both forecasting and prediction targets are derived from the same dataset, their statistical characteristics differ significantly. Forecasting targets typically follow a log-normal distribution, characterized by significant shifts in mean and variance over time, whereas prediction targets adhere to a normal distribution. Furthermore, although multi-step forecasting and prediction offer a broader perspective and richer information compared to single-step approaches, it is much more challenging due to factors such as cumulative errors and long-term temporal variance. As a result, many previous works have tackled either single-step stock price forecasting or prediction instead. To address these issues, we introduce a novel model, termed Patched Channel Integration Encoder (PCIE), to tackle both stock price forecasting and prediction. In this model, we utilize multiple stock channels that cover both historical prices and price changes, and design a novel tokenization method to effectively embed these channels in a cross-channel and temporally efficient manner. Specifically, the tokenization process involves univariate patching and temporal learning with a channel-mixing encoder to reduce cumulative errors. Comprehensive experiments validate that PCIE outperforms current state-of-the-art models in forecast and prediction tasks.

Tokenizing Stock Prices for Enhanced Multi-Step Forecast and Prediction

TL;DR

To address multi-step stock price forecasting and prediction, the authors introduce PCIE, a Patched Channel Integration Encoder that tokenizes multi-channel stock data through univariate patching and adaptive temporal learning, enabling a channel-mixing self-attention encoder to capture inter-series dependencies. They simultaneously perform a simple data preprocessing step that provides both price levels and changes , improving input representations. Direct multi-step forecasting is used to reduce cumulative errors compared with iterative approaches. Empirical results on US_71 and US_14L datasets across horizons demonstrate state-of-the-art performance for both forecast and prediction, with ablation confirming the critical role of tokenization and data augmentation.

Abstract

Effective stock price forecasting (estimating future prices) and prediction (estimating future price changes) are pivotal for investors, regulatory agencies, and policymakers. These tasks enable informed decision-making, risk management, strategic planning, and superior portfolio returns. Despite their importance, forecasting and prediction are challenging due to the dynamic nature of stock price data, which exhibit significant temporal variations in distribution and statistical properties. Additionally, while both forecasting and prediction targets are derived from the same dataset, their statistical characteristics differ significantly. Forecasting targets typically follow a log-normal distribution, characterized by significant shifts in mean and variance over time, whereas prediction targets adhere to a normal distribution. Furthermore, although multi-step forecasting and prediction offer a broader perspective and richer information compared to single-step approaches, it is much more challenging due to factors such as cumulative errors and long-term temporal variance. As a result, many previous works have tackled either single-step stock price forecasting or prediction instead. To address these issues, we introduce a novel model, termed Patched Channel Integration Encoder (PCIE), to tackle both stock price forecasting and prediction. In this model, we utilize multiple stock channels that cover both historical prices and price changes, and design a novel tokenization method to effectively embed these channels in a cross-channel and temporally efficient manner. Specifically, the tokenization process involves univariate patching and temporal learning with a channel-mixing encoder to reduce cumulative errors. Comprehensive experiments validate that PCIE outperforms current state-of-the-art models in forecast and prediction tasks.

Paper Structure

This paper contains 21 sections, 9 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Tokenization Process
  • Figure 2: PCIE Model Overview