Table of Contents
Fetching ...

Dependence structure estimation using Copula Recursive Trees

Oskar Laverny, Esterina Masiello, Véronique Maume-Deschamps, Didier Rullière

TL;DR

The Copula Recursive Tree (CORT) estimator is a flexible, consistent, piecewise linear estimator of a copula, leveraging the patchwork copula formalization and various piecewise constant density estimators.

Abstract

We construct the COpula Recursive Tree (CORT) estimator: a flexible, consistent, piecewise linear estimator of a copula, leveraging the patchwork copula formalization and various piecewise constant density estimators. While the patchwork structure imposes a grid, the CORT estimator is data-driven and constructs the (possibly irregular) grid recursively from the data, minimizing a chosen distance on the copula space. The addition of the copula constraints makes usual density estimators unusable, whereas the CORT estimator is only concerned with dependence and guarantees the uniformity of margins. Refinements such as localized dimension reduction and bagging are developed, analyzed, and tested through simulated data.

Dependence structure estimation using Copula Recursive Trees

TL;DR

The Copula Recursive Tree (CORT) estimator is a flexible, consistent, piecewise linear estimator of a copula, leveraging the patchwork copula formalization and various piecewise constant density estimators.

Abstract

We construct the COpula Recursive Tree (CORT) estimator: a flexible, consistent, piecewise linear estimator of a copula, leveraging the patchwork copula formalization and various piecewise constant density estimators. While the patchwork structure imposes a grid, the CORT estimator is data-driven and constructs the (possibly irregular) grid recursively from the data, minimizing a chosen distance on the copula space. The addition of the copula constraints makes usual density estimators unusable, whereas the CORT estimator is only concerned with dependence and guarantees the uniformity of margins. Refinements such as localized dimension reduction and bagging are developed, analyzed, and tested through simulated data.

Paper Structure

This paper contains 13 sections, 8 theorems, 42 equations, 10 figures, 7 tables, 2 algorithms.

Key Result

Proposition 1

Let $C_{\bm{p},\mathcal{L}}$ be a piecewise linear copula. Its Kendall $\tau$ and Spearman $\rho$ are given in closed form by: where we denote $\ell = (\bm{a},\bm{b}]$ and $k = (\bm{c},\bm{d}]$, and $\wedge,\;\vee$ denote respectively the minimum and maximum operator.

Figures (10)

  • Figure 1: (Dataset \ref{['data:recoveryourself']}) Running example. Data points are in red, and darker zones mean boxes with higher weights (black is the maximum possible in each leaf). On the left, the model after the first split. In the middle, the model after the next round of splits, and on the right after the third round.
  • Figure 2: (Dataset \ref{['data:recoveryourself']}) (a) The CORT estimator: in black, lower left, the input data. In red, upper-right, a simulation from the estimated tree. (b) Statistics of the forest: on the left, $\hat{K}$ and $\hat{J}$ in function of the number of trees. On the right, the Integrated Constraint Influence and square norm of each tree against the weight of the tree in the forest.
  • Figure 3: (Dataset \ref{['data:impossible']}) (a) The estimated tree: in black, lower left, the input data. In red, upper-right, a simulation from the estimated tree. (b) The estimated forest: in black, lower left, the input data. In red, upper-right, a simulation from the estimated forest.
  • Figure 4: (Dataset \ref{['data:clayton']}) (a) Representation from the tree: in black, lower left, the input data. In red, upper-right, a simulation from the estimated tree. (b) Forest Statistics: on the left, $\hat{K}$ and $\hat{J}$ in function of the number of trees. On the right, the Integrated Constraint Influence and square norm of each tree against the weight of the tree in the forest.
  • Figure 5: (Dataset \ref{['data:funcdep']}) (a) Representation from the tree: in black, lower left, the input data. In red, upper-right, a simulation from the estimated tree. (b) Forest Statistics: On the left, $\hat{K}$ and $\hat{J}$ in function of the number of trees. On the right, the Integrated Constraint Influence and square norm of each tree against the weight of the tree in the forest.
  • ...and 5 more figures

Theorems & Definitions (22)

  • Definition 1: Piecewise linear copula
  • Remark 1: Existence
  • Definition 2: Hyper-rectangles and suitable partitions
  • Proposition 1: Common dependence measures
  • Definition 3: Empirical Integrated Square Error
  • Definition 4: Copula constraints
  • Proposition 2: Quadratic program
  • Proposition 3: Independence of surrogate loss
  • Definition 5: Simple split and splitting dimensions
  • Remark 2: Degrees of freedom
  • ...and 12 more