Table of Contents
Fetching ...

New methods to compute the generalized chi-square distribution

Abhranil Das

TL;DR

This work tackles the challenge of computing the generalized chi-square distribution in all tail regimes by introducing two exact (ray-tracing and inverse Fourier) and two approximate (ellipse and infinite-tail) methods, complemented by open-source Matlab software. It builds a canonical mapping from generalized chi-square parameters to a standard-normal quadratic form, enabling flexible sampling and multiple computational strategies. The authors provide comprehensive accuracy and speed comparisons against established methods (Ruben, Imhof), derive tail-specific asymptotics, and validate performance with random parameter draws and discriminability measurements between equal-covariance multinormals. The framework delivers robust tail probabilities down to extreme levels (down to $10^{-10^{308}}$ in double precision via log-variance techniques) and offers practical guidance on selecting the best method for a given tail regime, thereby improving reliability in statistics, ML, and physics applications that rely on generalized quadratic forms.

Abstract

We present four new mathematical methods, two exact and two approximate, along with open-source software, to compute the cdf, pdf and inverse cdf of the generalized chi-square distribution. Some methods are geared for speed, while others are designed to be accurate far into the tails, using which we can also measure large values of the discriminability index $d'$ between multivariate normal distributions. We compare the accuracy and speed of these and previous methods, characterize their advantages and limitations, and identify the best methods to use in different cases.

New methods to compute the generalized chi-square distribution

TL;DR

This work tackles the challenge of computing the generalized chi-square distribution in all tail regimes by introducing two exact (ray-tracing and inverse Fourier) and two approximate (ellipse and infinite-tail) methods, complemented by open-source Matlab software. It builds a canonical mapping from generalized chi-square parameters to a standard-normal quadratic form, enabling flexible sampling and multiple computational strategies. The authors provide comprehensive accuracy and speed comparisons against established methods (Ruben, Imhof), derive tail-specific asymptotics, and validate performance with random parameter draws and discriminability measurements between equal-covariance multinormals. The framework delivers robust tail probabilities down to extreme levels (down to in double precision via log-variance techniques) and offers practical guidance on selecting the best method for a given tail regime, thereby improving reliability in statistics, ML, and physics applications that rely on generalized quadratic forms.

Abstract

We present four new mathematical methods, two exact and two approximate, along with open-source software, to compute the cdf, pdf and inverse cdf of the generalized chi-square distribution. Some methods are geared for speed, while others are designed to be accurate far into the tails, using which we can also measure large values of the discriminability index between multivariate normal distributions. We compare the accuracy and speed of these and previous methods, characterize their advantages and limitations, and identify the best methods to use in different cases.
Paper Structure (22 sections, 43 equations, 8 figures, 3 tables)

This paper contains 22 sections, 43 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Mapping from the generalized chi-square parameters to the quadratic function of a multinormal, and integrating using ray-trace. a. A generalized chi-square cdf at one of several points (left, larger dot) is the pdf integrated upto that point (right, green area). b. This integrated probability is the standard multinormal (blue blob) probability over a domain (green area) that belongs to a family of quadratics (colours in this family correspond across plots a-b). c. The lower (finite) and upper (infinite) tail cdf's of a generalized chi-square that is a non-central ellipse, are the standard multinormal (blue blob) probabilities inside the tiny ellipse (purple area) and outside the large ellipse (green area).
  • Figure 2: Ray-tracing method to compute the pdf of a general function $f(\bm{x})$ of a normal vector $\bm{x}$. a. The arrow is a ray through the mean (dot) of a standard multinormal along $\bm{n}$. Blue pdf $\phi^{\text{ray}}_d$ is the standard multinormal density along the ray. $\tilde{f}_{\bm{n}}(z)$ is the value of the standardized function $\tilde{f}(\bm{z})$ along the ray. Intervals where $\tilde{f}_{\bm{n}}$ lies between $c$ and $c+dc$ are the blue widths at $z_1$, $z_2$ and $z_3$. b. The pdf of a cubic function of a 4-dimensional correlated normal vector, computed by three methods in the same computation time. Missing dots are where a method wrongly computes the pdf as 0.
  • Figure 3: The infinite-tail approximation. a. Contour-integral derivation of the pdf. Black dots are singularities of the characteristic function. b. Comparing the infinite-tail (called simply 'tail') method with several other methods in the far tails of the distribution.
  • Figure 4: Comparing the ellipse approximation with Ruben's method for computing the cdf (left) and pdf (right) in the finite tail. Error-bands represent $10^{35}$ times the error for the ellipse cdf estimate, and $10^{45}$ times the error for the ellipse pdf estimate.
  • Figure 5: Computing the generalized chi-square cdf. a. Cdf of a generalized chi-square with a lower finite and upper infinite tail, computed with several methods. We plot the lower tail probability (cdf) till the median (dashed vertical line), and upper tail probability (ccdf) beyond it. The middle of the distribution, probabilities ${>}0.001$ (area highlighted pink) is in linear axes, and the tail regions (areas highlighted grey) are in double log axes. $\tcbhighmath{\small{\texttt{realmin}}}=10^{-308}$ is the double-precision limit. Table below shows the orders of the smallest lower and upper tail probabilities reached by the methods here, and their computation times per point. 'dp' and 'vp' mean double and variable precision. b. Cdf of a generalized chi-square with two infinite tails computed with several methods. Middle area of probability ${>}0.01$ is in linear axes, tail areas are in double log axes. Inset: Imhof integrand at $x=-2000$. Similar table below.
  • ...and 3 more figures