Redistributor: Transforming Empirical Data Distributions
Pavol Harar, Dennis Elbrächter, Monika Dörfler, Kory D. Johnson
TL;DR
Redistributor addresses the problem of transforming one empirical distribution into another, or into a known target, by composing the target CDF's inverse with the source CDF ($R=F_T^{-1}\circ F_S$). It provides practical estimators (KDE-based and linear-interpolated eCDFs), robust handling of duplicates and boundaries, and an efficient Python/Scikit-learn implementation, along with a solid Hadamard-differentiability framework guaranteeing consistency and asymptotic normality. The paper demonstrates broad applicability in image processing (color correction, photorealistic style transfer, photomosaics), data augmentation, and ML preprocessing, and shows favorable comparisons to model-based methods and neural approaches in terms of content fidelity and computational efficiency. Overall, Redistributor offers a principled, scalable, and interpretable tool for distribution matching with practical impact across vision, signal processing, and machine learning pipelines.
Abstract
We present an algorithm and package, Redistributor, which forces a collection of scalar samples to follow a desired distribution. When given independent and identically distributed samples of some random variable $S$ and the continuous cumulative distribution function of some desired target $T$, it provably produces a consistent estimator of the transformation $R$ which satisfies $R(S)=T$ in distribution. As the distribution of $S$ or $T$ may be unknown, we also include algorithms for efficiently estimating these distributions from samples. This allows for various interesting use cases in image processing, where Redistributor serves as a remarkably simple and easy-to-use tool that is capable of producing visually appealing results. For color correction it outperforms other model-based methods and excels in achieving photorealistic style transfer, surpassing deep learning methods in content preservation. The package is implemented in Python and is optimized to efficiently handle large datasets, making it also suitable as a preprocessing step in machine learning. The source code is available at https://github.com/paloha/redistributor.
