Table of Contents
Fetching ...

Scalable Mean-Variance Portfolio Optimization via Subspace Embeddings and GPU-Friendly Nesterov-Accelerated Projected Gradient

Yi-Shuai Niu, Yajuan Wang

Abstract

We develop a sketch-based factor reduction and a Nesterov-accelerated projected gradient algorithm (NPGA) with GPU acceleration, yielding a doubly accelerated solver for large-scale constrained mean-variance portfolio optimization. Starting from the sample covariance factor $L$, the method combines randomized subspace embedding, spectral truncation, and ridge stabilization to construct an effective factor $L_{eff}$. It then solves the resulting constrained problem with a structured projection computed by scalar dual search and GPU-friendly matrix-vector kernels, yielding one computational pipeline for the baseline, sketched, and Sketch-Truncate-Ridge (STR)-regularized models. We also establish approximation, conditioning, and stability guarantees for the sketching and STR models, including explicit $O(\varepsilon)$ bounds for the covariance approximation, the optimal value error, and the solution perturbation under $(\varepsilon,δ)$-subspace embeddings. Experiments on synthetic and real equity-return data show that the method preserves objective accuracy while reducing runtime substantially. On a 5440-asset real-data benchmark with 48374 training periods, NPGA-GPU solves the unreduced full model in 2.80 seconds versus 64.84 seconds for Gurobi, while the optimized compressed GPU variants remain in the low-single-digit-second regime. These results show that the full dense model is already practical on modern GPUs and that, after compression, the remaining bottleneck is projection rather than matrix-vector multiplication.

Scalable Mean-Variance Portfolio Optimization via Subspace Embeddings and GPU-Friendly Nesterov-Accelerated Projected Gradient

Abstract

We develop a sketch-based factor reduction and a Nesterov-accelerated projected gradient algorithm (NPGA) with GPU acceleration, yielding a doubly accelerated solver for large-scale constrained mean-variance portfolio optimization. Starting from the sample covariance factor , the method combines randomized subspace embedding, spectral truncation, and ridge stabilization to construct an effective factor . It then solves the resulting constrained problem with a structured projection computed by scalar dual search and GPU-friendly matrix-vector kernels, yielding one computational pipeline for the baseline, sketched, and Sketch-Truncate-Ridge (STR)-regularized models. We also establish approximation, conditioning, and stability guarantees for the sketching and STR models, including explicit bounds for the covariance approximation, the optimal value error, and the solution perturbation under -subspace embeddings. Experiments on synthetic and real equity-return data show that the method preserves objective accuracy while reducing runtime substantially. On a 5440-asset real-data benchmark with 48374 training periods, NPGA-GPU solves the unreduced full model in 2.80 seconds versus 64.84 seconds for Gurobi, while the optimized compressed GPU variants remain in the low-single-digit-second regime. These results show that the full dense model is already practical on modern GPUs and that, after compression, the remaining bottleneck is projection rather than matrix-vector multiplication.

Paper Structure

This paper contains 52 sections, 9 theorems, 75 equations, 4 figures, 5 tables, 1 algorithm.

Key Result

Theorem 2

Let $f(x)=\|L_{\mathrm{eff}}^\top x\|_2^2+\gamma\|x\|_2^2$, where $L_{\mathrm{eff}}$ and $\gamma\ge0$ specify the Baseline, Sketch, or STR model. Define and apply NPGA to $\min_{x\in F}f(x)$ with exact projected steps $x^{k+1}=\Pi_F\bigl(y^k-\alpha_k\nabla f(y^k)\bigr)$. Let $x^\star$ be any minimizer of $f$ over $F$. Then:

Figures (4)

  • Figure 1: Eigenvalue spectrum of the real-world covariance matrix. The spectrum decays rapidly for the first few dozen components before entering a long, slowly decaying bulk spanning several thousand dimensions.
  • Figure 2: Cumulative explained variance $E(r)$. Due to the extremely long noisy bulk, $80\%$ of the total variance is reached only after more than $2500$ eigenvalues, demonstrating the inadequacy of energy-based rank selection for real-world covariance matrices.
  • Figure 3: Approximation quality on synthetic instances as the sketch size grows. Both metrics improve as $s/\ell$ increases, and JL and CountSketch behave similarly once the retained energy level is fixed.
  • Figure 4: Synthetic solver behavior. The convex case follows the $k^{-2}$ benchmark, the ridge-stabilized case is close to the linear-rate benchmark, and GPU acceleration becomes substantial on large unreduced factors.

Theorems & Definitions (15)

  • Remark 1: Power method for estimating $\|L_{\mathrm{eff}}\|_2$
  • Theorem 2: Convergence of NPGA
  • Remark 3: Optimization error versus model error
  • Definition 4: Subspace embedding
  • Remark 5
  • Lemma 6: Johnson--Lindenstrauss case JohnsonLindenstrauss1984
  • Theorem 7: General subspace embedding Sarlos2006ClarksonWoodruff2017
  • Corollary 8: Geometry preservation on $\mathrm{Im}(L^\top)$
  • Remark 9: Geometric interpretation
  • Corollary 10: Spectral error of the covariance approximation
  • ...and 5 more