Table of Contents
Fetching ...

Sparse Portfolio Selection via Topological Data Analysis based Clustering

Anubha Goel, Damir Filipović, Puneet Pasricha

TL;DR

The paper addresses sparse portfolio selection by leveraging Topological Data Analysis (TDA) to capture non-linear, time-dependent dependencies among assets. It introduces time-aware distance measures on persistence diagrams and landscapes (AWD_p, DWD_p, ALD_p, DLD_p) and uses affinity propagation clustering to form sparse index-tracking and mean-variance portfolios without predefining portfolio size. Empirical results on daily returns of S&P 500 constituents from 2009–2022, including the COVID period, show that the TDA-based approach delivers lower tracking error, higher correlation with the benchmark, and favorable risk-adjusted performance compared with correlation-based and cardinality-constrained methods, often with controlled turnover. The findings demonstrate a data-driven, topology-guided framework that improves robustness to regime shifts and enhances practical sparse portfolio construction.

Abstract

This paper uses topological data analysis (TDA) tools and introduces a data-driven clustering-based stock selection strategy tailored for sparse portfolio construction. Our asset selection strategy exploits the topological features of stock price movements to select a subset of topologically similar (different) assets for a sparse index tracking (Markowitz) portfolio. We introduce new distance measures, which serve as an input to the clustering algorithm, on the space of persistence diagrams and landscapes that consider the time component of a time series. We conduct an empirical analysis on the S\&P index from 2009 to 2022, including a study on the COVID-19 data to validate the robustness of our methodology. Our strategy to integrate TDA with the clustering algorithm significantly enhanced the performance of sparse portfolios across various performance measures in diverse market scenarios.

Sparse Portfolio Selection via Topological Data Analysis based Clustering

TL;DR

The paper addresses sparse portfolio selection by leveraging Topological Data Analysis (TDA) to capture non-linear, time-dependent dependencies among assets. It introduces time-aware distance measures on persistence diagrams and landscapes (AWD_p, DWD_p, ALD_p, DLD_p) and uses affinity propagation clustering to form sparse index-tracking and mean-variance portfolios without predefining portfolio size. Empirical results on daily returns of S&P 500 constituents from 2009–2022, including the COVID period, show that the TDA-based approach delivers lower tracking error, higher correlation with the benchmark, and favorable risk-adjusted performance compared with correlation-based and cardinality-constrained methods, often with controlled turnover. The findings demonstrate a data-driven, topology-guided framework that improves robustness to regime shifts and enhances practical sparse portfolio construction.

Abstract

This paper uses topological data analysis (TDA) tools and introduces a data-driven clustering-based stock selection strategy tailored for sparse portfolio construction. Our asset selection strategy exploits the topological features of stock price movements to select a subset of topologically similar (different) assets for a sparse index tracking (Markowitz) portfolio. We introduce new distance measures, which serve as an input to the clustering algorithm, on the space of persistence diagrams and landscapes that consider the time component of a time series. We conduct an empirical analysis on the S\&P index from 2009 to 2022, including a study on the COVID-19 data to validate the robustness of our methodology. Our strategy to integrate TDA with the clustering algorithm significantly enhanced the performance of sparse portfolios across various performance measures in diverse market scenarios.
Paper Structure (17 sections, 5 theorems, 34 equations, 11 figures, 15 tables)

This paper contains 17 sections, 5 theorems, 34 equations, 11 figures, 15 tables.

Key Result

Theorem 1

Let ${x}$ and ${y}$ be two time series in ${\mathbb R}^T$. Let $X=\{ x_{I_1},\dots,x_{I_m}\}$ and $Y=\{ y_{I_1},\dots,y_{I_m}\}$ be their Takens' embeddings in ${\mathbb R}^d$. Then

Figures (11)

  • Figure 1: Examples of two-dimensional noisy point clouds.
  • Figure 2: A diagram depicting the Rips filtration process. Each point in the point cloud is equipped with an equal-sized ball, with the radius of the ball serving as the filtering parameter. By increasing the value of the radius, a succession of nested simplicial complexes is formed. It causes characteristics like connected components and holes to emerge and vanish. The initial loop appeared in \ref{['rips2']} and died in \ref{['rips33']}. In \ref{['rips33']}, two additional substantial large holes can be observed, which die in \ref{['rips44']}.
  • Figure 3: (a) The persistence diagram associated to Figure \ref{['rips']}; the dots in it represent the birth and death of a feature; for instance, three off-diagonal red dots represent the birth and death of three significant holes in Figure \ref{['rips']}; (b) the corresponding persistence landscape.
  • Figure 4: Time series and point cloud in $\mathbb{R}^2$ constructed using Takens' embedding
  • Figure 5: A comparison of Jaccard similarity between the clusters, from different windows, containing the index for six kernel matrices.
  • ...and 6 more figures

Theorems & Definitions (16)

  • Definition 1
  • Definition 2
  • Definition 3
  • Remark 1
  • Definition 4
  • Definition 5
  • Theorem 1
  • proof
  • Proposition 1
  • Proposition 2
  • ...and 6 more