Table of Contents
Fetching ...

Extension of coupling via the Projection of Optimal Transport

Jakwang Kim, Young-Heon Kim, Chan Park

Abstract

In many statistical settings, two types of data are available: coupled data, which preserve the joint structure among variables but are limited in size due to cost or privacy constraints, and marginal data, which are available at larger scales but lack joint structure. Since standard methods require coupled data, marginal information is often discarded. We propose a fully nonparametric procedure that integrates decoupled marginal data with a limited amount of coupled data to improve the downstream analysis. The approach can be understood as an extension of coupling via projection in optimal transport. Specifically, the estimator is a solution for the optimal transport projection over the space of probability measures, which genuinely provides a natural geometric interpretation. Not only is its stability established, but its sample complexity is also derived using recent advances in statistical optimal transport. In addition to this, we present its explicit formula based on ``shadow," a notion introduced by Eckstein and Nutz. Furthermore, the estimator can be approximated in almost linear time and in parallel by entropic shadow, which demonstrates the theoretical and practical strengths of our methods. Lastly, we present experiments with real and synthetic data to justify the performance of our method.

Extension of coupling via the Projection of Optimal Transport

Abstract

In many statistical settings, two types of data are available: coupled data, which preserve the joint structure among variables but are limited in size due to cost or privacy constraints, and marginal data, which are available at larger scales but lack joint structure. Since standard methods require coupled data, marginal information is often discarded. We propose a fully nonparametric procedure that integrates decoupled marginal data with a limited amount of coupled data to improve the downstream analysis. The approach can be understood as an extension of coupling via projection in optimal transport. Specifically, the estimator is a solution for the optimal transport projection over the space of probability measures, which genuinely provides a natural geometric interpretation. Not only is its stability established, but its sample complexity is also derived using recent advances in statistical optimal transport. In addition to this, we present its explicit formula based on ``shadow," a notion introduced by Eckstein and Nutz. Furthermore, the estimator can be approximated in almost linear time and in parallel by entropic shadow, which demonstrates the theoretical and practical strengths of our methods. Lastly, we present experiments with real and synthetic data to justify the performance of our method.

Paper Structure

This paper contains 20 sections, 15 theorems, 68 equations, 5 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $\nu^i_m$ be the $i$-th marginal of $\pi^0_m$ for $i=1, \dots, K$, and $\bm{\nu}_m:=(\nu^1_m, \dots, \nu^K_m)$. Then

Theorems & Definitions (36)

  • Theorem 3.1: Stability
  • proof
  • Remark 3.2: Statistical interpretation of the stability bounds
  • Corollary 3.3: Sample complexity
  • Remark 3.4: Curse of dimensionality
  • Remark 3.5
  • Theorem 3.6: Shadow Eckstein_Nutz_2022
  • proof
  • Remark 3.7
  • Remark 3.8
  • ...and 26 more