A Numerical Rosenblatt Method for Forced Variable Independence
Radek Vavřička, Tomáš Sýkora
TL;DR
The paper addresses the problem of achieving quasi-independence between observables used in data-driven particle physics analyses, where the ABCD method requires background variables to be independent. It proposes a Rosenblatt-inspired framework that transforms one observable into a classifier $\gamma$ that is independent of the other for background-like data, preserving marginal distributions. Two numerical implementations, IRGI and KDE, construct $\gamma$ from finite samples: IRGI uses irregular grid binning to produce $\gamma_d$ and KDE uses Gaussian kernel smoothing to produce $\gamma_{\sigma_r}$, with explicit formulas and tunable parameters $d$ and $\sigma_r$. Across abstract blob cases, image classification tasks, and a high-energy physics dataset (LHC Olympics), the methods substantially reduce the distance correlation $\text{DCC}$ while maintaining discriminative power (AUC), enabling robust ABCD-based signal estimation and improved classifier independence in practice.
Abstract
A novel numerical technique is presented to transform one random variable within a system toward statistical quasi-independence from any other random variable in the system. The method's applicability is demonstrated through a particle physics example where a classifier is rendered quasi-independent from an observable quantity.
