Table of Contents
Fetching ...

Temporal Representation Learning for Stock Similarities and Its Applications in Investment Management

Yoontae Hwang, Stefan Zohren, Yongjae Lee

TL;DR

SimStock introduces a temporal self-supervised learning framework that blends SSL with temporal domain generalization to learn stock representations robust to market non-stationarity. By constructing a temporal feature variant with moving averages, static metadata embeddings, and a dimension corruption augmentation, it learns embeddings via a triplet loss and attention-based module to identify similar stocks across and within exchanges. The approach yields state-of-the-art performance in finding similar stocks and translates into practical gains for pairs trading, index tracking of thematic ETFs, and portfolio optimization, outperforming traditional covariances and existing SSL baselines across multiple markets. These results highlight the potential of data-driven, temporally aware representations to enhance investment decision-making and risk management in a dynamic global financial landscape.

Abstract

In the era of rapid globalization and digitalization, accurate identification of similar stocks has become increasingly challenging due to the non-stationary nature of financial markets and the ambiguity in conventional regional and sector classifications. To address these challenges, we examine SimStock, a novel temporal self-supervised learning framework that combines techniques from self-supervised learning (SSL) and temporal domain generalization to learn robust and informative representations of financial time series data. The primary focus of our study is to understand the similarities between stocks from a broader perspective, considering the complex dynamics of the global financial landscape. We conduct extensive experiments on four real-world datasets with thousands of stocks and demonstrate the effectiveness of SimStock in finding similar stocks, outperforming existing methods. The practical utility of SimStock is showcased through its application to various investment strategies, such as pairs trading, index tracking, and portfolio optimization, where it leads to superior performance compared to conventional methods. Our findings empirically examine the potential of data-driven approach to enhance investment decision-making and risk management practices by leveraging the power of temporal self-supervised learning in the face of the ever-changing global financial landscape.

Temporal Representation Learning for Stock Similarities and Its Applications in Investment Management

TL;DR

SimStock introduces a temporal self-supervised learning framework that blends SSL with temporal domain generalization to learn stock representations robust to market non-stationarity. By constructing a temporal feature variant with moving averages, static metadata embeddings, and a dimension corruption augmentation, it learns embeddings via a triplet loss and attention-based module to identify similar stocks across and within exchanges. The approach yields state-of-the-art performance in finding similar stocks and translates into practical gains for pairs trading, index tracking of thematic ETFs, and portfolio optimization, outperforming traditional covariances and existing SSL baselines across multiple markets. These results highlight the potential of data-driven, temporally aware representations to enhance investment decision-making and risk management in a dynamic global financial landscape.

Abstract

In the era of rapid globalization and digitalization, accurate identification of similar stocks has become increasingly challenging due to the non-stationary nature of financial markets and the ambiguity in conventional regional and sector classifications. To address these challenges, we examine SimStock, a novel temporal self-supervised learning framework that combines techniques from self-supervised learning (SSL) and temporal domain generalization to learn robust and informative representations of financial time series data. The primary focus of our study is to understand the similarities between stocks from a broader perspective, considering the complex dynamics of the global financial landscape. We conduct extensive experiments on four real-world datasets with thousands of stocks and demonstrate the effectiveness of SimStock in finding similar stocks, outperforming existing methods. The practical utility of SimStock is showcased through its application to various investment strategies, such as pairs trading, index tracking, and portfolio optimization, where it leads to superior performance compared to conventional methods. Our findings empirically examine the potential of data-driven approach to enhance investment decision-making and risk management practices by leveraging the power of temporal self-supervised learning in the face of the ever-changing global financial landscape.
Paper Structure (32 sections, 16 equations, 11 figures, 13 tables)

This paper contains 32 sections, 16 equations, 11 figures, 13 tables.

Figures (11)

  • Figure 1: SimStock combines self-supervised learning framework with temporal domain generalization for more robust and comprehensive stock representations.
  • Figure 2: Dimension corruption method for generating positive and negative views from token embeddings ($\mathbf{TKE}^{s}$). The positive view $\mathbf{H}_{pos}^{s}$ is created by applying a small perturbation in dimension order to the original TKE, while the negative view $\mathbf{H}_{neg}^{s}$ is generated with a larger perturbation in dimension order, preserving more of the original temporal structure in the positive view compared to the negative view.
  • Figure 3: Performance of models in same exchange (diagonal) and different exchanges (off-diagonal) scenarios for finding similar stocks. The performance is evaluated using TOP@$k$ Correlation metrics, where $k$ = 9, 7, 5, 3, and 1. Each data point represents the average correlation between the target stock and the top $k$ similar stocks identified by the respective model.
  • Figure 4: Performance of models in same exchange (diagonal) and different exchanges (off-diagonal) scenarios for finding similar stocks. The performance is evaluated using TOP@$k$ DTW metrics, where $k$ = 9, 7, 5, 3, and 1. Each data point represents the average correlation between the target stock and the top $k$ similar stocks identified by the respective model.
  • Figure 5: Cumulative return curves of the four thematic ETFs (ARKK, SKYY, BOTZ, and LIT) and their corresponding tracking portfolios constructed using the top 10 similar stocks identified by SimStock and the baseline methods (TS2VEC, Corr1, and Corr2) from the US exchange. The closer a portfolio's curve follows the respective ETF curve (dotted black line), the better the tracking performance.
  • ...and 6 more figures