Sufficient dimension reduction for regression with metric space-valued responses
Abdul-Nasah Soale, Yuexiao Dong
TL;DR
This work tackles regression with non-Euclidean, metric-space responses by replacing kernel-based Fréchet embedding with a fast, kernel-free Euclidean embedding that uses pairwise distances. It introduces ensemble-based surrogate responses and develops surrogate-assisted OLS and SIR estimators to recover the Fréchet central space ${\\mathcal S}_{Y|{\\bf X}}$ without requiring the response space to embed isometrically into a Hilbert space. The authors demonstrate through simulations and real-data analyses (COVID-19 transmission distributions and brain connectivity) that the surrogate-assisted methods often outperform kernel Fréchet counterparts, offering robustness to outliers and heteroscedasticity and enabling richer, distribution- and network-level insights. The approach provides a practical, scalable framework for dimension reduction with complex responses and suggests avenues for extending SDR to nonlinear settings and other SDR families.
Abstract
Data visualization and dimension reduction for regression between a general metric space-valued response and Euclidean predictors is proposed. Current Fréchét dimension reduction methods require that the response metric space be continuously embeddable into a Hilbert space, which imposes restriction on the type of metric and kernel choice. We relax this assumption by proposing a Euclidean embedding technique which avoids the use of kernels. Under this framework, classical dimension reduction methods such as ordinary least squares and sliced inverse regression are extended. An extensive simulation experiment demonstrates the superior performance of the proposed method on synthetic data compared to existing methods where applicable. The real data analysis of factors influencing the distribution of COVID-19 transmission in the U.S. and the association between BMI and structural brain connectivity of healthy individuals are also investigated.
