Table of Contents
Fetching ...

Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data

Soroush Omranpour, Guillaume Rabusseau, Reihaneh Rabbany

TL;DR

The paper tackles stock movement prediction by introducing Higher Order Transformers that extend self-attention to higher-order, tensor-valued inputs, enabling rich cross-variable and temporal interactions. It couples a low-rank Kronecker decomposition with kernelized linear attention to achieve scalable computation, and an encoder-decoder multimodal architecture to fuse price data with tweet signals. Empirical results on the Stocknet dataset show competitive performance, with notable gains from multimodal integration and the proposed attention mechanisms, while ablations affirm the value of joint stock/time attention and kernel attention. The work demonstrates practical potential for improved market predictions and lays groundwork for future checks on profitability across additional datasets and real-world scenarios.

Abstract

In this paper, we tackle the challenge of predicting stock movements in financial markets by introducing Higher Order Transformers, a novel architecture designed for processing multivariate time-series data. We extend the self-attention mechanism and the transformer architecture to a higher order, effectively capturing complex market dynamics across time and variables. To manage computational complexity, we propose a low-rank approximation of the potentially large attention tensor using tensor decomposition and employ kernel attention, reducing complexity to linear with respect to the data size. Additionally, we present an encoder-decoder model that integrates technical and fundamental analysis, utilizing multimodal signals from historical prices and related tweets. Our experiments on the Stocknet dataset demonstrate the effectiveness of our method, highlighting its potential for enhancing stock movement prediction in financial markets.

Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data

TL;DR

The paper tackles stock movement prediction by introducing Higher Order Transformers that extend self-attention to higher-order, tensor-valued inputs, enabling rich cross-variable and temporal interactions. It couples a low-rank Kronecker decomposition with kernelized linear attention to achieve scalable computation, and an encoder-decoder multimodal architecture to fuse price data with tweet signals. Empirical results on the Stocknet dataset show competitive performance, with notable gains from multimodal integration and the proposed attention mechanisms, while ablations affirm the value of joint stock/time attention and kernel attention. The work demonstrates practical potential for improved market predictions and lays groundwork for future checks on profitability across additional datasets and real-world scenarios.

Abstract

In this paper, we tackle the challenge of predicting stock movements in financial markets by introducing Higher Order Transformers, a novel architecture designed for processing multivariate time-series data. We extend the self-attention mechanism and the transformer architecture to a higher order, effectively capturing complex market dynamics across time and variables. To manage computational complexity, we propose a low-rank approximation of the potentially large attention tensor using tensor decomposition and employ kernel attention, reducing complexity to linear with respect to the data size. Additionally, we present an encoder-decoder model that integrates technical and fundamental analysis, utilizing multimodal signals from historical prices and related tweets. Our experiments on the Stocknet dataset demonstrate the effectiveness of our method, highlighting its potential for enhancing stock movement prediction in financial markets.

Paper Structure

This paper contains 20 sections, 1 theorem, 15 equations, 2 figures, 3 tables.

Key Result

Theorem 5.1

Given any fourth-order attention tensor $\mathcal{A} \in \mathbb{R}^{N \times N \times T \times T}$, which can be reshaped into a matrix $A \in \mathbb{R}^{NT \times NT}$, there exists a rank $R$ such that matrix $A$ can be expressed as the sum of Kronecker products of matrices $B_i \in \mathbb{R}^{ for some $R \leq \min(N^2, T^2)$. As $R$ approaches $\min(N^2, T^2)$, the approximation becomes exa

Figures (2)

  • Figure 1: The overview of High Order Attention using Kronecker decomposition.
  • Figure 2: Multimodal transformer architecture. As depicted in the figure, tweet encodings are fed to the transformer encoder, and the historical price data are given to the transformer decoder.

Theorems & Definitions (6)

  • Definition 1: Tensor
  • Definition 2: Tensor Mode and Fibers
  • Definition 3: Tensor slice
  • Definition 4: Tensor Matricization
  • Definition 5: Kronecker Product
  • Theorem 5.1