Table of Contents
Fetching ...

Mean and quantile regression in the copula setting: properties, sharp bounds and a note on estimation

Henrik Kaiser, Wolfgang Trutschnig

TL;DR

The paper investigates how uniform marginals on $[0,1]$ constrain mean and quantile regression in a copula framework, deriving sharp, dimension-aware bounds for the $L^p$-deviation of the mean regression from $\tfrac{1}{2}$ and for the distribution of large deviations. It extends these results to quantile regression, proving tight bounds for the average quantile function and establishing a corresponding $D_{A,p}$ metric that governs regression convergence. Key findings include that the maximal $L^p$-deviation of mean regression is $\tfrac{1}{2}(p+1)^{-1/p}$, attained by completely dependent copulas, and that the average quantile satisfies $\int Q_C^\tau(\mathbf{x}) \, d\mu_A(\mathbf{x}) \in [\tfrac{\tau}{2}, \tfrac{\tau+1}{2}]$ with sharp bounds. The paper also proves strong consistency of the empirical checkerboard estimator for both mean and quantile regression in the bivariate setting, providing practical guarantees for nonparametric copula-based regression estimation and highlighting caveats for simplifying assumptions in pair copula constructions.

Abstract

Driven by the interest on how uniformity of marginal distributions propa\-gates to properties of regression functions, in this contribution we tackle the following questions: Given a $(d-1)$-dimensional random vector $\textbf{X}$ and a random variable $Y$ such that all univariate marginals of $(\textbf{X},Y)$ are uniformly distributed on $[0,1]$, how large can the average absolute deviation of the mean and the quantile regression function of $Y$ given $\textbf{X}$ from the value $\frac{1}{2}$ be, and how much mass may sets with large deviation have? We answer these questions by deriving sharp inequalities, both in the mean as well as in the quantile setting, and sketch some cautionary consequences to nowadays quite popular pair copula constructions involving the so-called simplifying assumption. Rounding off our results, working with the so-called empirical checkerboard estimator in the bivariate setting, we show strong consistency for both regression types and illustrate the speed of convergence in terms of a simulation study.

Mean and quantile regression in the copula setting: properties, sharp bounds and a note on estimation

TL;DR

The paper investigates how uniform marginals on constrain mean and quantile regression in a copula framework, deriving sharp, dimension-aware bounds for the -deviation of the mean regression from and for the distribution of large deviations. It extends these results to quantile regression, proving tight bounds for the average quantile function and establishing a corresponding metric that governs regression convergence. Key findings include that the maximal -deviation of mean regression is , attained by completely dependent copulas, and that the average quantile satisfies with sharp bounds. The paper also proves strong consistency of the empirical checkerboard estimator for both mean and quantile regression in the bivariate setting, providing practical guarantees for nonparametric copula-based regression estimation and highlighting caveats for simplifying assumptions in pair copula constructions.

Abstract

Driven by the interest on how uniformity of marginal distributions propa\-gates to properties of regression functions, in this contribution we tackle the following questions: Given a -dimensional random vector and a random variable such that all univariate marginals of are uniformly distributed on , how large can the average absolute deviation of the mean and the quantile regression function of given from the value be, and how much mass may sets with large deviation have? We answer these questions by deriving sharp inequalities, both in the mean as well as in the quantile setting, and sketch some cautionary consequences to nowadays quite popular pair copula constructions involving the so-called simplifying assumption. Rounding off our results, working with the so-called empirical checkerboard estimator in the bivariate setting, we show strong consistency for both regression types and illustrate the speed of convergence in terms of a simulation study.

Paper Structure

This paper contains 9 sections, 19 theorems, 122 equations, 4 figures.

Key Result

Lemma 1

Suppose that $d \geq 3$, that $C \in \mathcal{C}^d$ and let $F \in \mathcal{B}(\mathbb{I})$ be arbitrary but fixed. Then, for $\lambda_{d-2}$-a.e. $\mathbf{x}_{1:d-2} \in \mathbb{I}^{d-2}$, we have

Figures (4)

  • Figure 1: Support of the ordinal sum $O_2$ considered in Example \ref{['Ex20250721']} (gray) and regression function $r_{O_2}$ (dashed blue line).
  • Figure 2: Empirical $N=63$ checkerboard density for a sample of size $n = 10.000$ from the Marshall Olkin copula with parameters $(\alpha, \beta) = (0.35, 0.65)$, true mean and median regression functions (solid black and gray line, respectively), and corresponding estimators $r_{\mathfrak{Cb}_N(E_n)}$ and $Q_{\mathfrak{Cb}_N(E_n)}^{0.2}$ (black and gray step functions).
  • Figure 3: Empirical $N=63$ checkerboard density for a sample of size $n = 10.000$ from the Clayton copula with parameter $\theta=2$, true mean and median regression functions (solid black and gray line, respectively), and the corresponding estimators $r_{\mathfrak{Cb}_N(E_n)}$ and $Q_{\mathfrak{Cb}_N(E_n)}^{0.2}$ (black and gray step functions).
  • Figure 4: Boxplot summarizing the $L^1$-distances between the estimated and the true mean regression as well as the estimated and the true median regression function, respectively. For each sample size $n$ (on the $x$-axis) a total of $R=500$ runs were performed.

Theorems & Definitions (34)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • proof
  • Lemma 4: complete dependence
  • proof
  • Theorem 5: upper bound
  • proof
  • Corollary 6
  • proof
  • ...and 24 more