Optimal Binning for Small-Angle Neutron Scattering Data Using the Freedman-Diaconis Rule
Jessie E. An, Chi-Huan Tung, Changwoo Do, Wei-Ren Chen
TL;DR
The paper addresses inefficiencies in fixed-bin SANS data reduction by introducing a statistically grounded binning strategy based on the Freedman-Diaconis rule $h = 2\, \mathrm{IQR}\, n^{-1/3}$. It derives the competing error terms for counting noise and binning distortion, showing that the optimal bin width scales as $h_{opt} \propto N_{total}^{-1/3}$ and that the classical FD formula provides a practical adaptive binning estimator. Using synthetic Debye-fragment data for a Gaussian polymer, the authors demonstrate faithful reproduction of $I(Q)$ curvature while suppressing random fluctuations, with an optimal bin count $K_{opt}$ around 38 in the tested scenario. The approach offers a physics-informed, adaptive histogramming framework that connects statistics, instrument resolution, and data representation, with clear pathways to higher-dimensional extensions and correlation considerations.
Abstract
Small-Angle Neutron Scattering (SANS) data analysis often relies on fixed-width binning schemes that overlook variations in signal strength and structural complexity. We introduce a statistically grounded approach based on the Freedman-Diaconis (FD) rule, which minimizes the mean integrated squared error between the histogram estimate and the true intensity distribution. By deriving the competing scaling relations for counting noise ($\propto h^{-1}$) and binning distortion ($\propto h^{2}$), we establish an optimal bin width that balances statistical precision and structural resolution. Application to synthetic data from the Debye scattering function of a Gaussian polymer chain demonstrates that the FD criterion quantitatively determines the most efficient binning, faithfully reproducing the curvature of $I(Q)$ while minimizing random error. The optimal width follows the expected scaling $h_{\mathrm{opt}} \propto N_{\mathrm{total}}^{-1/3}$, delineating the transition between noise- and resolution-limited regimes. This framework provides a unified, physics-informed basis for adaptive, statistically efficient binning in neutron scattering experiments.
