Copula Discrepancy: Benchmarking Dependence Structure
Agnideep Aich, Ashit Baran Aich
TL;DR
This paper introduces Copula Discrepancy (CD), a lightweight, dependence-focused statistic for benchmarking how well a sample preserves a fixed target copula, independent of marginals. It defines two estimation paths— a fast moment-based estimator and a more powerful MLE-based estimator—along with information-theoretic extensions CKL and CED to quantify differences in the copula density via Kendall’s tau and entropy-based measures. The authors establish consistency, asymptotic normality (for the moment-based CD), and robustness properties, and demonstrate that CD can reliably distinguish on-target from off-target dependence even when Kendall’s tau is matched, including under tail-structure mismatches. Through three controlled experiments, CD proves useful for dependence-aware tuning of biased samplers (e.g., SGLD), diagnosing tail dependence, and providing a complementary perspective to KSD. The framework scales to practical settings with millisecond-level overhead and offers a principled, interpretable supplement to omnibus discrepancy measures for dependence analysis in Bayesian computation and generative modeling.
Abstract
We study a simple statistic for benchmarking how well a sample preserves a known bivariate dependence structure. Given a target copula family (Clayton or Gumbel) and parameter $θ_P$, the Copula Discrepancy (CD) compares the target Kendall's tau $τ(θ_P)$ with the Kendall's tau implied by a parameter $\hatθ$ fitted to the sample within the target family, i.e., $|τ(θ_P)-τ(\hatθ)|$. We develop a moment-based version, prove consistency, asymptotic normality, and robustness results under i.i.d.\ sampling, and use an MLE-based version empirically for greater power against tail-structure misspecification. Building on this, we define two information-theoretic copula summaries, a copula KL divergence (CKL) and a copula entropy gap (CED), and establish basic consistency and central limit results for their plug-in estimators. In controlled experiments, CD reliably separates on-target and off-target copulas with matched Kendall's $τ$, provides a dependence-aware signal for tuning SGLD step sizes where Effective Sample Size favors overly aggressive (and biased) settings, and remains stably nonzero under deliberate tail-dependence mismatch where a naive $τ$-based diagnostic fails; CKL and CED offer a complementary Shannon-style view that echoes these findings. Timing benchmarks show that both CD variants incur only millisecond-level overhead over the tested range and exhibit near-linear empirical scaling in sample size, providing a lightweight, dependence-focused complement to quadratic-cost omnibus discrepancies such as the Kernel Stein Discrepancy (KSD).
