Statistical Modeling of Univariate Multimodal Data
Paraskevi Chasani, Aristidis Likas
TL;DR
The paper tackles univariate multimodal data by introducing UniSplit, a non-parametric valley-detection procedure operating on the ecdf via gcm/lcm points to partition data into unimodal subsets. Each unimodal subset is modeled with a Uniform Mixture Model (UMM) and the collection forms a Unimodal Mixture Model (UDMM), yielding a hierarchical, hyperparameter-free density representation with automatic component counting. The approach is demonstrated to be competitive with or superior to Gaussian mixtures and KDE-based methods across synthetic and real data, and extends naturally to image segmentation and Naive Bayes classification with robust performance and minimal parameter tuning. The results highlight practical benefits for flexible density estimation and clustering without fixed component numbers, with potential extensions to multidimensional data via projections or recursive splitting into interpretable decision trees.
Abstract
Unimodality constitutes a key property indicating grouping behavior of the data around a single mode of its density. We propose a method that partitions univariate data into unimodal subsets through recursive splitting around valley points of the data density. For valley point detection, we introduce properties of critical points on the convex hull of the empirical cumulative density function (ecdf) plot that provide indications on the existence of density valleys. Next, we apply a unimodal data modeling approach that provides a statistical model for each obtained unimodal subset in the form of a Uniform Mixture Model (UMM). Consequently, a hierarchical statistical model of the initial dataset is obtained in the form of a mixture of UMMs, named as the Unimodal Mixture Model (UDMM). The proposed method is non-parametric, hyperparameter-free, automatically estimates the number of unimodal subsets and provides accurate statistical models as indicated by experimental results on clustering and density estimation tasks.
