Table of Contents
Fetching ...

A Numerical Rosenblatt Method for Forced Variable Independence

Radek Vavřička, Tomáš Sýkora

TL;DR

The paper addresses the problem of achieving quasi-independence between observables used in data-driven particle physics analyses, where the ABCD method requires background variables to be independent. It proposes a Rosenblatt-inspired framework that transforms one observable into a classifier $\gamma$ that is independent of the other for background-like data, preserving marginal distributions. Two numerical implementations, IRGI and KDE, construct $\gamma$ from finite samples: IRGI uses irregular grid binning to produce $\gamma_d$ and KDE uses Gaussian kernel smoothing to produce $\gamma_{\sigma_r}$, with explicit formulas and tunable parameters $d$ and $\sigma_r$. Across abstract blob cases, image classification tasks, and a high-energy physics dataset (LHC Olympics), the methods substantially reduce the distance correlation $\text{DCC}$ while maintaining discriminative power (AUC), enabling robust ABCD-based signal estimation and improved classifier independence in practice.

Abstract

A novel numerical technique is presented to transform one random variable within a system toward statistical quasi-independence from any other random variable in the system. The method's applicability is demonstrated through a particle physics example where a classifier is rendered quasi-independent from an observable quantity.

A Numerical Rosenblatt Method for Forced Variable Independence

TL;DR

The paper addresses the problem of achieving quasi-independence between observables used in data-driven particle physics analyses, where the ABCD method requires background variables to be independent. It proposes a Rosenblatt-inspired framework that transforms one observable into a classifier that is independent of the other for background-like data, preserving marginal distributions. Two numerical implementations, IRGI and KDE, construct from finite samples: IRGI uses irregular grid binning to produce and KDE uses Gaussian kernel smoothing to produce , with explicit formulas and tunable parameters and . Across abstract blob cases, image classification tasks, and a high-energy physics dataset (LHC Olympics), the methods substantially reduce the distance correlation while maintaining discriminative power (AUC), enabling robust ABCD-based signal estimation and improved classifier independence in practice.

Abstract

A novel numerical technique is presented to transform one random variable within a system toward statistical quasi-independence from any other random variable in the system. The method's applicability is demonstrated through a particle physics example where a classifier is rendered quasi-independent from an observable quantity.

Paper Structure

This paper contains 16 sections, 21 equations, 31 figures.

Figures (31)

  • Figure 1: Events in the $xy$ phase plane separated into regions ABCD.
  • Figure 2: Variable transformation toward independence through coordinate rotation.
  • Figure 3: Defining and testing samples drawn from identical underlying probability densities.
  • Figure 4: Irregular grid construction (black lines) and interpolation vertices (red) around defining background sample, applied to testing samples.
  • Figure 5: Irregular grid construction (black lines) and interpolation vertices (red) around defining background sample, applied to testing samples.
  • ...and 26 more figures