Tree-based conditional copula estimation

Francesco Bonacina; Olivier Lopez; Maud Thomas

Tree-based conditional copula estimation

Francesco Bonacina, Olivier Lopez, Maud Thomas

Abstract

This paper proposes a regression tree procedure to estimate conditional copulas. The associated algorithm determines classes of observations based on covariate values and fits a simple parametric copula model on each class. The association parameter changes from one class to another, allowing for non-linearity in the dependence structure modeling. It also allows the definition of classes of observations on which the so-called "simplifying assumption" [see Derumigny and Fermanian, 2017] holds reasonably well. When considering observations belonging to a given class separately, the association parameter no longer depends on the covariates according to our model. In this paper, we derive asymptotic consistency results for the regression tree procedure and show that the proposed pruning methodology, that is the model selection techniques selecting the appropriate number of classes, is optimal in some sense. Simulations provide finite sample results and an analysis of data of cases of human influenza presents the practical behavior of the procedure.

Tree-based conditional copula estimation

Abstract

Paper Structure (37 sections, 6 theorems, 87 equations, 3 figures, 1 algorithm)

This paper contains 37 sections, 6 theorems, 87 equations, 3 figures, 1 algorithm.

Introduction
Regression trees for conditional copula analysis
Model and notations
Regression tree estimation of the dependence structure
Construction of the maximal tree
Pruning step
Estimation of the margins
Consistency results
Conditions and assumptions
Asymptotic theory for a single tree
Oracle property for the pruning step
Empirical evidence
Simulation study
The regression framework
Definition of different scenarios
...and 22 more sections

Key Result

Proposition 1

Under Assumptions a:pseudo to assum:llk, and if $n[K\log K]^{-1}\rightarrow \infty,$

Figures (3)

Figure 1: Results of simulations. Results for the Clayton, Frank, and Gumbel copulas are depicted on the different rows. For each copula, the results for the three types of covariate dependence are reported on the x-axis. The six colors identify different models: red, orange, and green are for the conditional copula model fitted on the observations $\mathbf{U}$ and on the pseudo-observations $\mathbf{V}$ and $\mathbf{W}$, respectively, while cyan, blue and magenta are for the benchmark model. In the first two columns, we show results in terms of MSE for the $\tau$ estimates and the cumulative copula estimates, in the third column in terms of log-likelihood. In the fourth column, we report the distributions of the number of splits, i.e. the number of leaves minus 1, identified by the regression trees of the conditional copula models. Each boxplot represents the results for 500 datasets of 1000 points each.
Figure 2: Optimal tree identified by the Frank conditional copula model applied to data of relative abundances of influenza subtypes across countries and regions. Data on the relative abundances of influenza subtypes are considered for 800 countries-years (corresponding to the 800 points in the top ternary plot). Similarly, for each node of the tree, a simplex represents the subtype relative abundances of the countries-years clustered in the node. We use a ternary color code to distinguish countries-years with dominance of A/H1N1pdm (cyan), A/H3N2 (pink), and B (yellow). For each split, the condition used to partition the observations is indicated. From top-left to bottom-right, the number of observations in each leaf is 630, 120, 16, 20 and 14. In the same order, Kendall's $\tau$ coefficients equal -0.06, -0.09, 0.3, -0.03, and 0.25.
Figure 3: Optimal trees for margins estimation. Country and Influenza Transmission Zones are classified by the regression trees to approximate the response variables $Y^{(1)}$ (plot A) and $Y^{(2)}$ (plot B). The coefficients of determination of the two fits are 0.29 and 0.5, respectively. For each node, the average value of the response variable and the percentage of the observations included are indicated. In the top-right corner, a legend illustrates the abbreviations used for the Influenza Transmission Zones.

Theorems & Definitions (11)

Remark 1
Proposition 1
Theorem 2
Theorem 3
Lemma 4
proof
Lemma 5
proof
Remark 2
Proposition 6
...and 1 more

Tree-based conditional copula estimation

Abstract

Tree-based conditional copula estimation

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (11)