Table of Contents
Fetching ...

Adaptive Estimation of Multivariate Binary Distributions under Sparse Generalized Correlation Structures

Alexandre Belloni, Yan Chen, Matthew Harding

TL;DR

This work addresses the challenge of estimating the joint distribution of an M-dimensional binary vector under sparse generalized correlation structures, leveraging Bahadur's polynomial expansion to connect sparsity with conditional independence. The authors formulate a high-dimensional, nuisance-parameter problem and develop a two-stage plug-in approach, followed by computationally tractable regularized adversarial estimators that achieve rate-optimal convergence. They extend the framework to covariates via localized estimators and establish rigorous rates of convergence for both no-covariate and covariate settings, including detailed results for the marginal-probability estimators. The methodology is applied to causal inference with multiple binary treatments, showing finite-sample improvements over direct probability estimation and providing a robust route to estimating generalized propensity scores and ATEs in high-dimensional treatment spaces. Numerical simulations validate the theoretical claims and illustrate practical gains in estimating ATEs under complex treatment structures.

Abstract

We consider the problem of estimating the joint distribution of an $M$-dimensional binary vector, which involves exponentially many parameters without additional assumptions. Using the representation from \citet{bahadur1959representation}, we relate the sparsity of its parameters to conditional independence among components. The maximum likelihood estimator is computationally infeasible and prone to overfitting. {We reformulate the problem as estimating a high-dimensional vector of generalized correlation coefficients, quantifying interaction effects among all component subsets, together with low or moderate-dimensional nuisance parameters corresponding to the marginal probabilities.} Since the marginal probabilities can be consistently estimated, we first propose a two-stage procedure that first estimates the marginal probabilities and then applies an $\ell_1$-regularized estimator for the generalized correlations, exploiting sparsity arising from potential independence structures. While computationally efficient, this estimator is not rate-optimal. We therefore further develop a regularized adversarial estimator that attains the optimal rate under standard regularity conditions while remaining tractable. The framework naturally extends to settings with covariates. We apply the proposed estimators to causal inference with multiple binary treatments and demonstrate substantial finite-sample improvements over non-adaptive estimators that estimate all probabilities directly. Simulation studies corroborate the theoretical results.

Adaptive Estimation of Multivariate Binary Distributions under Sparse Generalized Correlation Structures

TL;DR

This work addresses the challenge of estimating the joint distribution of an M-dimensional binary vector under sparse generalized correlation structures, leveraging Bahadur's polynomial expansion to connect sparsity with conditional independence. The authors formulate a high-dimensional, nuisance-parameter problem and develop a two-stage plug-in approach, followed by computationally tractable regularized adversarial estimators that achieve rate-optimal convergence. They extend the framework to covariates via localized estimators and establish rigorous rates of convergence for both no-covariate and covariate settings, including detailed results for the marginal-probability estimators. The methodology is applied to causal inference with multiple binary treatments, showing finite-sample improvements over direct probability estimation and providing a robust route to estimating generalized propensity scores and ATEs in high-dimensional treatment spaces. Numerical simulations validate the theoretical claims and illustrate practical gains in estimating ATEs under complex treatment structures.

Abstract

We consider the problem of estimating the joint distribution of an -dimensional binary vector, which involves exponentially many parameters without additional assumptions. Using the representation from \citet{bahadur1959representation}, we relate the sparsity of its parameters to conditional independence among components. The maximum likelihood estimator is computationally infeasible and prone to overfitting. {We reformulate the problem as estimating a high-dimensional vector of generalized correlation coefficients, quantifying interaction effects among all component subsets, together with low or moderate-dimensional nuisance parameters corresponding to the marginal probabilities.} Since the marginal probabilities can be consistently estimated, we first propose a two-stage procedure that first estimates the marginal probabilities and then applies an -regularized estimator for the generalized correlations, exploiting sparsity arising from potential independence structures. While computationally efficient, this estimator is not rate-optimal. We therefore further develop a regularized adversarial estimator that attains the optimal rate under standard regularity conditions while remaining tractable. The framework naturally extends to settings with covariates. We apply the proposed estimators to causal inference with multiple binary treatments and demonstrate substantial finite-sample improvements over non-adaptive estimators that estimate all probabilities directly. Simulation studies corroborate the theoretical results.

Paper Structure

This paper contains 47 sections, 32 theorems, 417 equations, 2 figures, 4 tables.

Key Result

Theorem 1

For any $y \in \{0,1\}^M$ and $x \in \mathcal{X}$, where $f(y,\alpha_0(x),r_0(x)):= 1 + \sum_{k = 1}^M \sum_{\ell \in [M]^k : \left|\ell\right|\geq2} r_{0\ell}(x)\prod_{\ell_m\in\ell} z_{\ell_m}(y,\alpha_0(x))$.

Figures (2)

  • Figure 1: Histograms illustrating the distributions of $1+W_i^\prime r_0$ and $1+W_i(\alpha_0(X_i),Y_i)^\prime r_0(X_i)$ for different values of $s$.
  • Figure 2: Coverage ratios of true ATEs by propensity score estimators, for all combinations of sparsity $s$ and total sample size $N$, and binary treatment vector dimension $M=4$ ($16$ treatment combinations, with $(0,0,0,0)$ being the control level).

Theorems & Definitions (49)

  • Theorem 1
  • Remark 1: Non-negativity probability constrains in $\mathbb{K}$
  • Remark 2: Higher Order Approximations
  • Theorem 2
  • Corollary 1
  • Theorem 3
  • Corollary 2
  • Remark 3
  • Theorem 4
  • Remark 4
  • ...and 39 more