Table of Contents
Fetching ...

Copula Discrepancy: Benchmarking Dependence Structure

Agnideep Aich, Ashit Baran Aich

TL;DR

This paper introduces Copula Discrepancy (CD), a lightweight, dependence-focused statistic for benchmarking how well a sample preserves a fixed target copula, independent of marginals. It defines two estimation paths— a fast moment-based estimator and a more powerful MLE-based estimator—along with information-theoretic extensions CKL and CED to quantify differences in the copula density via Kendall’s tau and entropy-based measures. The authors establish consistency, asymptotic normality (for the moment-based CD), and robustness properties, and demonstrate that CD can reliably distinguish on-target from off-target dependence even when Kendall’s tau is matched, including under tail-structure mismatches. Through three controlled experiments, CD proves useful for dependence-aware tuning of biased samplers (e.g., SGLD), diagnosing tail dependence, and providing a complementary perspective to KSD. The framework scales to practical settings with millisecond-level overhead and offers a principled, interpretable supplement to omnibus discrepancy measures for dependence analysis in Bayesian computation and generative modeling.

Abstract

We study a simple statistic for benchmarking how well a sample preserves a known bivariate dependence structure. Given a target copula family (Clayton or Gumbel) and parameter $θ_P$, the Copula Discrepancy (CD) compares the target Kendall's tau $τ(θ_P)$ with the Kendall's tau implied by a parameter $\hatθ$ fitted to the sample within the target family, i.e., $|τ(θ_P)-τ(\hatθ)|$. We develop a moment-based version, prove consistency, asymptotic normality, and robustness results under i.i.d.\ sampling, and use an MLE-based version empirically for greater power against tail-structure misspecification. Building on this, we define two information-theoretic copula summaries, a copula KL divergence (CKL) and a copula entropy gap (CED), and establish basic consistency and central limit results for their plug-in estimators. In controlled experiments, CD reliably separates on-target and off-target copulas with matched Kendall's $τ$, provides a dependence-aware signal for tuning SGLD step sizes where Effective Sample Size favors overly aggressive (and biased) settings, and remains stably nonzero under deliberate tail-dependence mismatch where a naive $τ$-based diagnostic fails; CKL and CED offer a complementary Shannon-style view that echoes these findings. Timing benchmarks show that both CD variants incur only millisecond-level overhead over the tested range and exhibit near-linear empirical scaling in sample size, providing a lightweight, dependence-focused complement to quadratic-cost omnibus discrepancies such as the Kernel Stein Discrepancy (KSD).

Copula Discrepancy: Benchmarking Dependence Structure

TL;DR

This paper introduces Copula Discrepancy (CD), a lightweight, dependence-focused statistic for benchmarking how well a sample preserves a fixed target copula, independent of marginals. It defines two estimation paths— a fast moment-based estimator and a more powerful MLE-based estimator—along with information-theoretic extensions CKL and CED to quantify differences in the copula density via Kendall’s tau and entropy-based measures. The authors establish consistency, asymptotic normality (for the moment-based CD), and robustness properties, and demonstrate that CD can reliably distinguish on-target from off-target dependence even when Kendall’s tau is matched, including under tail-structure mismatches. Through three controlled experiments, CD proves useful for dependence-aware tuning of biased samplers (e.g., SGLD), diagnosing tail dependence, and providing a complementary perspective to KSD. The framework scales to practical settings with millisecond-level overhead and offers a principled, interpretable supplement to omnibus discrepancy measures for dependence analysis in Bayesian computation and generative modeling.

Abstract

We study a simple statistic for benchmarking how well a sample preserves a known bivariate dependence structure. Given a target copula family (Clayton or Gumbel) and parameter , the Copula Discrepancy (CD) compares the target Kendall's tau with the Kendall's tau implied by a parameter fitted to the sample within the target family, i.e., . We develop a moment-based version, prove consistency, asymptotic normality, and robustness results under i.i.d.\ sampling, and use an MLE-based version empirically for greater power against tail-structure misspecification. Building on this, we define two information-theoretic copula summaries, a copula KL divergence (CKL) and a copula entropy gap (CED), and establish basic consistency and central limit results for their plug-in estimators. In controlled experiments, CD reliably separates on-target and off-target copulas with matched Kendall's , provides a dependence-aware signal for tuning SGLD step sizes where Effective Sample Size favors overly aggressive (and biased) settings, and remains stably nonzero under deliberate tail-dependence mismatch where a naive -based diagnostic fails; CKL and CED offer a complementary Shannon-style view that echoes these findings. Timing benchmarks show that both CD variants incur only millisecond-level overhead over the tested range and exhibit near-linear empirical scaling in sample size, providing a lightweight, dependence-focused complement to quadratic-cost omnibus discrepancies such as the Kernel Stein Discrepancy (KSD).

Paper Structure

This paper contains 55 sections, 12 theorems, 110 equations, 7 figures, 7 tables, 3 algorithms.

Key Result

Theorem 1

Let $\{(U_i,V_i)\}_{i=1}^n$ be i.i.d. pseudo-observations from a copula $C_{\theta_P}$ in the target family. Let $\hat{\tau}_n$ be the sample Kendall's tau and $\hat{\theta}_n^{(M)}=\tau^{-1}(\hat{\tau}_n)$ be the moment-based estimator. Under Assumption assump:regularity:

Figures (7)

  • Figure 1: MLE-based Copula Discrepancy (CD) for samples drawn from the on-target Gumbel copula (blue) and the off-target Clayton copula (orange), both constructed to have the same population Kendall’s tau $\tau = 0.6$. Each point is the mean CD over 100 replications; the shaded bands show the 95% confidence interval for the mean. Even though rank correlation is matched, the on-target CD decays rapidly toward zero as $n$ increases, while the off-target CD remains substantially larger (around $8\times 10^{-2}$ to $10^{-1}$) and varies only mildly across sample sizes.
  • Figure 2: Information–theoretic diagnostics for Experiment \ref{['sec:exp1']}. Left: Copula KL-based discrepancy (CKL) evaluated against a Kendall’s-tau-matched Gumbel reference copula. Right: Copula entropy gap (CED) between the fitted copula and the Gumbel target. In both panels, blue corresponds to on-target Gumbel samples and orange to off-target Clayton samples; lines show means over 100 replications and shaded bands show 95% confidence intervals. CKL and CED reproduce the same qualitative ordering as CD: fits to on-target samples trend toward lower discrepancy as $n$ grows, while fits to off-target samples remain consistently higher.
  • Figure 3: Hyperparameter selection for SGLD targeting a bimodal Gaussian mixture. Left: Mean Copula Discrepancy (CD, solid line; left y-axis, log$_{10}$ scale) and Mean Effective Sample Size (ESS, dashed line; right y-axis) as functions of the SGLD step-size $\epsilon$. Shaded bands show 95% confidence intervals for the means over 100 replications (computed on the original scale and displayed after transformation for CD). ESS is ultimately maximized at the largest step-size ($\epsilon = 10^{-1}$), but is not strictly monotone over the full grid, with a small dip in the smallest-$\epsilon$ regime. In contrast, the CD curve attains its minimum at an intermediate step-size ($\epsilon = 4.64\times 10^{-3}$), indicating improved agreement with the target dependence structure at this setting. Right: SGLD samples for the ESS-selected step-size ($\epsilon = 10^{-1}$, blue triangles) and the CD-selected step-size ($\epsilon = 4.64\times 10^{-3}$, orange circles). The ESS-selected sample cloud is visibly more dispersed around each mode, while the CD-selected configuration yields a tighter concentration around the two modal regions.
  • Figure 4: Shannon-type dependence diagnostics for the SGLD hyperparameter study. Left: Copula KL (CKL, forward KL) estimate versus step-size $\epsilon$, with mean and 95% confidence interval over 100 replications. Right: Copula Entropy Gap (CED) versus $\epsilon$, again with mean and 95% confidence interval. Both summaries vary with $\epsilon$ and agree with CD in preferring an intermediate regime; in particular, both CKL and CED attain their lowest means at $\epsilon = 4.64\times 10^{-3}$. However, their uncertainty bands remain wide relative to the scale of changes across $\epsilon$, so the CD plot in Figure \ref{['fig:exp2']} remains the most directly interpretable diagnostic for selecting a step-size in this setting.
  • Figure 5: Experiment 3: diagnostics for a tail-dependence mismatch. The Naive Tau Discrepancy (orange, left axis) shrinks toward zero as $n$ increases and would incorrectly suggest that the Gumbel sample is compatible with the Clayton target, since their Kendall's taus agree by construction. In contrast, the MLE-based Copula Discrepancy (CD, blue circles, left axis) remains bounded away from zero across all sample sizes, indicating a persistent structural mismatch driven by tail dependence. The KSD (blue triangles, right axis) also detects the mismatch and increases with sample size, but it lives on a different scale and is therefore not directly comparable in magnitude to CD. Both axes are shown on logarithmic scales. Shaded regions show 95% confidence intervals for the mean over 100 replications.
  • ...and 2 more figures

Theorems & Definitions (22)

  • Remark 3.1: Moment-based CD as a baseline
  • Definition 4.1: Archimedean copulas
  • Definition 4.2: Copula Discrepancy
  • Remark 4.1: Intuition under misspecification
  • Theorem 1: Consistency of Moment-Based Estimator
  • Theorem 2: Asymptotic Distribution of Moment-Based CD
  • Corollary 4.1: Asymptotic level-$\alpha$ test based on moment-based CD
  • Definition 4.3: Copula equivalence test
  • Theorem 3: Asymptotic test and rejection rule
  • Theorem 4: Bounded influence for the moment-based CD
  • ...and 12 more