Mixture Quantiles Estimated by Constrained Linear Regression

Cheng Peng; Yizhou Li; Stan Uryasev

Mixture Quantiles Estimated by Constrained Linear Regression

Cheng Peng, Yizhou Li, Stan Uryasev

TL;DR

The paper introduces a mixture-quantiles model that expresses the quantile function as a linear combination of basis quantiles, yielding a convex estimation problem via constrained linear regression. It proves that, with an L_q loss and a full set of probabilities, the estimator is asymptotically equivalent to a minimum q-Wasserstein distance estimator and is asymptotically normal under standard regularity conditions. The approach offers practical regularization and constraints (nonnegativity, cardinality, L_1/L_2 penalties, and P-spline smoothing) to improve finite-sample performance and tail accuracy. Through extensive simulations and real-data case studies (Gaussian mixtures, electricity prices, and financial drawdowns), the method demonstrates superior tail fitting and substantial computational efficiency relative to benchmark approaches while preserving good global fit. Overall, this framework provides a scalable, interpretable, and tail-aware alternative for univariate distribution modeling with broad applicability in risk management and uncertainty quantification.

Abstract

We study the problem of modeling univariate distributions via their quantile functions. We introduce a flexible family of distributions whose quantile function is a linear combination of basis quantiles. Because the model is linear in its parameters, estimation reduces to constrained linear regression, yielding a convex optimization problem that readily accommodates cardinality constraints as well as L1 or smoothness regularization. For Lq-type objectives we show the estimator is asymptotically equivalent to a minimum q-Wasserstein distance estimator and establish asymptotic normality. Experiments on simulated and real-world datasets demonstrate that the proposed method accurately captures both the central body and extreme tails of distributions while requiring substantially less computation than standard benchmark approaches.

Mixture Quantiles Estimated by Constrained Linear Regression

TL;DR

Abstract

Paper Structure (40 sections, 4 theorems, 34 equations, 5 figures, 3 tables)

This paper contains 40 sections, 4 theorems, 34 equations, 5 figures, 3 tables.

Introduction
Mixture Quantiles Model
Model formulation
Parameter Estimation by Constrained Linear Regression
Optimization problem statement
Constraints and penalties
Estimation of probabilities
Q-Q Plot
Minimum Wasserstein Distance Estimator
Asymptotic Normality
Discussion of Estimation with Weighted Least Squares Regression
Equivariance
Model with Single Basis Function
Case Study
Comparison with Well-Specified Gaussian Mixture Model in Simulation
...and 25 more sections

Key Result

Proposition 1

If (i) $\widehat{\bm{\theta}}_0$ is a unique minimizer of $f(\bm{\theta})$ ; (ii) $\widehat{\bm{\theta}}_0$ is an element in the interior of a convex set $\bm{\Theta}$, then $\widehat{\bm{\theta}}_N \overset{p}{\rightarrow} \widehat{\bm{\theta}}_0$. If the model is correctly specified, then $\wideha

Figures (5)

Figure 1: Comparison of density function and quantile function and emperical data in the last fold of cross validation.
Figure 2: Comparison of density function and Zipf plot in the last fold of cross validation.
Figure 3: Q-Q plots of models fitted by least squares regression with cardinality constraint $C=1,2$ and coefficient $\lambda=0.6,12$ of $L_1$ penalty. MLE is included as a benchmark. $\{(x_n,y_n)\}_{n=1}^N$ = black points, $x_n=$$n$-th sample order statistics, $y_n=$ quantile with confidence level $\frac{n}{N+1}$ of the model.
Figure 4: Q-Q plots of models fitted by least absolute deviation regression with cardinality constraint $C=1,2$ and coefficient $\lambda=1.1,1.9$ of $L_1$ penalty. MLE is included as a benchmark. $\{(x_n,y_n)\}_{n=1}^N$ = black points, $x_n=$$n$-th sample order statistics, $y_n=$ quantile with confidence level $\frac{n}{N+1}$ of the model.
Figure 5: Convergence of error and weighted 2-Wasserstein distance obtained by weighted least squares regression. Lower (black) line = objective (error) of the optimization problem of weighted least squares regression. Upper (red) line = Wasserstein distance between the estimated quantile function and the true quantile function. Wide (grey) band = $90$% confidence band of the error obtained by $100$ repeated experiments. Thin (red) band = $90$% confidence band of the distance obtained by $100$ repeated experiments. The horizontal axis = sample size in log scale.

Theorems & Definitions (4)

Proposition 1
Proposition 2
Proposition 3
Proposition 4

Mixture Quantiles Estimated by Constrained Linear Regression

TL;DR

Abstract

Mixture Quantiles Estimated by Constrained Linear Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)