Table of Contents
Fetching ...

Adaptive Geometric Regression for High-Dimensional Structured Data

Pawel Gajer, Jacques Ravel

TL;DR

Addresses regression in high-dimensional data with low intrinsic dimension by shifting the analysis domain from ambient space to a data-driven geometric object. The method builds and updates a triple $(K^{(j)}, g^{(j)}, \rho^{(j)})$, advances densities via $\rho_0^{(j+1)} = e^{-t L_0^{(j)}} \rho_0^{(j)}$ and smooths the response using $\hat{y}^{(j)} = (M_0^{(j)} + \eta L_0^{(j)})^{-1} M_0^{(j)} y$, with edge masses modulated by outcome coherence. Key contributions include the wedge-Gram-based Riemannian structure on the nerve complex, diffusion-distance metrics, and an outcome-aware diffusion-barrier mechanism that concentrates mass in coherent regions while separating boundaries. Extensions cover higher-order simplices, density-based complex evolution, and joint feature-response smoothing, with uncertainty quantification via spectral methods and an open-source implementation in the gflow package.

Abstract

We present a geometric framework for regression on structured high-dimensional data that shifts the analysis from the ambient space to a geometric object capturing the data's intrinsic structure. The method addresses a fundamental challenge in analyzing datasets with high ambient dimension but low intrinsic dimension, such as microbiome compositions, where traditional approaches fail to capture the underlying geometric structure. Starting from a k-nearest neighbor covering of the feature space, the geometry evolves iteratively through heat diffusion and response-coherence modulation, concentrating mass within regions where the response varies smoothly while creating diffusion barriers where the response changes rapidly. This iterative refinement produces conditional expectation estimates that respect both the intrinsic geometry of the feature space and the structure of the response.

Adaptive Geometric Regression for High-Dimensional Structured Data

TL;DR

Addresses regression in high-dimensional data with low intrinsic dimension by shifting the analysis domain from ambient space to a data-driven geometric object. The method builds and updates a triple , advances densities via and smooths the response using , with edge masses modulated by outcome coherence. Key contributions include the wedge-Gram-based Riemannian structure on the nerve complex, diffusion-distance metrics, and an outcome-aware diffusion-barrier mechanism that concentrates mass in coherent regions while separating boundaries. Extensions cover higher-order simplices, density-based complex evolution, and joint feature-response smoothing, with uncertainty quantification via spectral methods and an open-source implementation in the gflow package.

Abstract

We present a geometric framework for regression on structured high-dimensional data that shifts the analysis from the ambient space to a geometric object capturing the data's intrinsic structure. The method addresses a fundamental challenge in analyzing datasets with high ambient dimension but low intrinsic dimension, such as microbiome compositions, where traditional approaches fail to capture the underlying geometric structure. Starting from a k-nearest neighbor covering of the feature space, the geometry evolves iteratively through heat diffusion and response-coherence modulation, concentrating mass within regions where the response varies smoothly while creating diffusion barriers where the response changes rapidly. This iterative refinement produces conditional expectation estimates that respect both the intrinsic geometry of the feature space and the structure of the response.

Paper Structure

This paper contains 1 section, 63 equations, 1 table.

Table of Contents

  1. Introduction