Table of Contents
Fetching ...

Conditional Copula models using loss-based Bayesian Additive Regression Trees

Tathagata Basu, Fabrizio Leisen, Cristiano Villa, Kevin Wilson

Abstract

The study of dependence between random variables under external influences is a challenging problem in multivariate analysis. We address this by proposing a novel semi-parametric approach for conditional copula models using Bayesian additive regression trees (BART) models. BART is becoming a popular approach in statistical modelling due to its simple ensemble type formulation complemented by its ability to provide inferential insights. Although BART allows us to model complex functional relationships, it tends to suffer from overfitting. In this article, we exploit a loss-based prior for the tree topology that is designed to reduce the tree complexity. In addition, we propose a novel adaptive Reversible Jump Markov Chain Monte Carlo algorithm that is ergodic in nature and requires very few assumptions allowing us to model complex and non-smooth likelihood functions with ease. Moreover, we show that our method can efficiently recover the true tree structure and approximate a complex conditional copula parameter, and that our adaptive routine can explore the true likelihood region under a sub-optimal proposal variance. Lastly, we provide case studies concerning the effect of gross domestic product on the dependence between the life expectancies and literacy rates of the male and female populations of different countries.

Conditional Copula models using loss-based Bayesian Additive Regression Trees

Abstract

The study of dependence between random variables under external influences is a challenging problem in multivariate analysis. We address this by proposing a novel semi-parametric approach for conditional copula models using Bayesian additive regression trees (BART) models. BART is becoming a popular approach in statistical modelling due to its simple ensemble type formulation complemented by its ability to provide inferential insights. Although BART allows us to model complex functional relationships, it tends to suffer from overfitting. In this article, we exploit a loss-based prior for the tree topology that is designed to reduce the tree complexity. In addition, we propose a novel adaptive Reversible Jump Markov Chain Monte Carlo algorithm that is ergodic in nature and requires very few assumptions allowing us to model complex and non-smooth likelihood functions with ease. Moreover, we show that our method can efficiently recover the true tree structure and approximate a complex conditional copula parameter, and that our adaptive routine can explore the true likelihood region under a sub-optimal proposal variance. Lastly, we provide case studies concerning the effect of gross domestic product on the dependence between the life expectancies and literacy rates of the male and female populations of different countries.

Paper Structure

This paper contains 29 sections, 4 theorems, 48 equations, 25 figures, 8 tables, 3 algorithms.

Key Result

Theorem 4.1

If ass:bounded:cop-ass:bounded:tree are satisfied and $\gamma$ adaption strategy is defined by alg:ada:prop. Then the following two conditions hold: where $\pi(\cdot\mid u_1, u_2)$ denote the target distribution defined on $\mathcal{S}$.

Figures (25)

  • Figure 1: Scatterplot of male life expectancy against log-GDP (left); female life expectancy against log-GDP (middle); and pseudo observations of female and male life expectancies (right).
  • Figure 2: Estimated dependence between male life expectancy and female life expectancy conditional on per capita log-GDP. For modelling, we use 5 trees and run 4 parallel chains of 50000 iterations. For posterior inference, we discard the first 5000 samples.
  • Figure 3: Trace plots of the log-likelihood obtained from our analyses with life expectancies of the male and female populations. The plots are obtained by running 4 parallel chains, each with 50000 MCMC iterations and 5 trees. The left columns shows analyses with C-BART and the right column shows analyses with A-C-BART.
  • Figure 4: Scatterplot of male literacy against log-GDP (left); scatterplot of female literacy against log-GDP (middle); and pseudo observations of female and male literacy rate (right).
  • Figure 5: Estimated dependence between male literacy and female literacy, conditional on log-GDP. For modelling, we use 5 trees and run 4 parallel chains of 50000 iterations. For posterior inference, we discard the first 5000 samples.
  • ...and 20 more figures

Theorems & Definitions (9)

  • Definition 4.1: Value at observation
  • Theorem 4.1
  • proof
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Theorem 4.1
  • proof