A Saddle Point Algorithm for Robust Data-Driven Factor Model Problems
Shabnam Khodakaramzadeh, Soroosh Shafiee, Gabriel de Albuquerque Gleizer, Peyman Mohajerin Esfahani
TL;DR
The paper tackles robust covariance factor modeling in high dimensions by casting the factor-model problem as a robust low-rank plus diagonal decomposition within a ball around the empirical covariance, minimizing $\mathrm{Tr}(L)$ subject to $\Sigma = L + D$ lying in $\mathbb{B}_{\varepsilon}^{\mathrm{d}}(\widehat{\Sigma})$, with $L\succeq 0$ and $D \succeq 0$. It develops a saddle-point reformulation and a scalable first-order algorithm that relies on a linear minimization oracle (LMO) to solve the inner min, paired with Dykstra’s projection to enforce conic constraints. The authors provide semi-closed-form LMO solutions for three distance measures—Frobenius, KL, and Gelbrich—together with explicit Lipschitz constants for the dual function, ensuring convergence guarantees and enabling efficient computation in high dimensions. Numerical experiments on synthetic and real data demonstrate fast convergence, improved estimation of the ground-truth covariance, and favorable execution times compared with MOSEK, particularly as dimension grows. The work offers a scalable, robust approach to data-driven factor models with practical impact for covariance estimation and risk analytics in high-dimensional settings.
Abstract
We study the factor model problem, which aims to uncover low-dimensional structures in high-dimensional datasets. Adopting a robust data-driven approach, we formulate the problem as a saddle-point optimization. Our primary contribution is a first-order algorithm that solves this reformulation by leveraging a linear minimization oracle (LMO). We further develop semi-closed form solutions (up to a scalar) for three specific LMOs, corresponding to the Frobenius norm, Kullback-Leibler divergence, and Gelbrich (aka Wasserstein) distance. The analysis includes explicit quantification of these LMOs' regularity conditions, notably the Lipschitz constants of the dual function, which govern the algorithm's convergence performance. Numerical experiments confirm our method's effectiveness in high-dimensional settings, outperforming standard off-the-shelf optimization solvers.
