Sparse Portfolio Selection via Topological Data Analysis based Clustering
Anubha Goel, Damir Filipović, Puneet Pasricha
TL;DR
The paper addresses sparse portfolio selection by leveraging Topological Data Analysis (TDA) to capture non-linear, time-dependent dependencies among assets. It introduces time-aware distance measures on persistence diagrams and landscapes (AWD_p, DWD_p, ALD_p, DLD_p) and uses affinity propagation clustering to form sparse index-tracking and mean-variance portfolios without predefining portfolio size. Empirical results on daily returns of S&P 500 constituents from 2009–2022, including the COVID period, show that the TDA-based approach delivers lower tracking error, higher correlation with the benchmark, and favorable risk-adjusted performance compared with correlation-based and cardinality-constrained methods, often with controlled turnover. The findings demonstrate a data-driven, topology-guided framework that improves robustness to regime shifts and enhances practical sparse portfolio construction.
Abstract
This paper uses topological data analysis (TDA) tools and introduces a data-driven clustering-based stock selection strategy tailored for sparse portfolio construction. Our asset selection strategy exploits the topological features of stock price movements to select a subset of topologically similar (different) assets for a sparse index tracking (Markowitz) portfolio. We introduce new distance measures, which serve as an input to the clustering algorithm, on the space of persistence diagrams and landscapes that consider the time component of a time series. We conduct an empirical analysis on the S\&P index from 2009 to 2022, including a study on the COVID-19 data to validate the robustness of our methodology. Our strategy to integrate TDA with the clustering algorithm significantly enhanced the performance of sparse portfolios across various performance measures in diverse market scenarios.
