From sectorial coarse graining to extreme coarse graining of S&P 500 correlation matrices
Manan Vyas, M. Mijaíl Martínez-Ramos, Parisa Majari, Thomas H. Seligman
TL;DR
The paper addresses the high dimensionality of Pearson correlation matrices for stock returns and proposes extreme coarse graining (ECG) to a real symmetric $2\times 2$ matrix by two-block averaging, preserving the average correlation as a key parameter, and compares this to sectorial coarse graining (CG) which yields a $10\times10$ matrix. Using 322 S&P 500 stocks over 2006–2023 ($T=4430$) with epochs of $L=20$ days and $k$-means clustering, the authors test three block-choices to form the ECG and analyze market-state dynamics. ECG produces a three-parameter representation $(x,y,z)$ that captures essential features of market transitions with comparable qualitative structure to CG, though certain relationships (e.g., average correlation to $\lambda_{max}$) differ in sign. The study demonstrates that significant dimensionality reduction is possible without losing the core dynamical picture, offering a compact framework for visualizing market states and suggesting avenues for extension to other markets and noise-robust techniques such as Power-Map or wavelets.
Abstract
Starting from the Pearson Correlation Matrix of stock returns and from the desire to obtain a reduced number of parameters relevant for the dynamics of a financial market, we propose to take the idea of a sectorial matrix, which would have a large number of parameters, to the reduced picture of a real symmetric $2 \times 2$ matrix, extreme case, that still conserves the desirable feature that the average correlation can be one of the parameters. This is achieved by averaging the correlation matrix over blocks created by choosing two subsets of stocks for rows and columns and averaging over each of the resulting blocks. Averaging over these blocks, we retain the average of the correlation matrix. We shall use a random selection for two equal block sizes as well as two specific, hopefully relevant, ones that do not produce equal block sizes. The results show that one of the non-random choices has somewhat different properties, whose meaning will have to be analyzed from an economy point of view.
