Table of Contents
Fetching ...

Statistical Modeling of Univariate Multimodal Data

Paraskevi Chasani, Aristidis Likas

TL;DR

The paper tackles univariate multimodal data by introducing UniSplit, a non-parametric valley-detection procedure operating on the ecdf via gcm/lcm points to partition data into unimodal subsets. Each unimodal subset is modeled with a Uniform Mixture Model (UMM) and the collection forms a Unimodal Mixture Model (UDMM), yielding a hierarchical, hyperparameter-free density representation with automatic component counting. The approach is demonstrated to be competitive with or superior to Gaussian mixtures and KDE-based methods across synthetic and real data, and extends naturally to image segmentation and Naive Bayes classification with robust performance and minimal parameter tuning. The results highlight practical benefits for flexible density estimation and clustering without fixed component numbers, with potential extensions to multidimensional data via projections or recursive splitting into interpretable decision trees.

Abstract

Unimodality constitutes a key property indicating grouping behavior of the data around a single mode of its density. We propose a method that partitions univariate data into unimodal subsets through recursive splitting around valley points of the data density. For valley point detection, we introduce properties of critical points on the convex hull of the empirical cumulative density function (ecdf) plot that provide indications on the existence of density valleys. Next, we apply a unimodal data modeling approach that provides a statistical model for each obtained unimodal subset in the form of a Uniform Mixture Model (UMM). Consequently, a hierarchical statistical model of the initial dataset is obtained in the form of a mixture of UMMs, named as the Unimodal Mixture Model (UDMM). The proposed method is non-parametric, hyperparameter-free, automatically estimates the number of unimodal subsets and provides accurate statistical models as indicated by experimental results on clustering and density estimation tasks.

Statistical Modeling of Univariate Multimodal Data

TL;DR

The paper tackles univariate multimodal data by introducing UniSplit, a non-parametric valley-detection procedure operating on the ecdf via gcm/lcm points to partition data into unimodal subsets. Each unimodal subset is modeled with a Uniform Mixture Model (UMM) and the collection forms a Unimodal Mixture Model (UDMM), yielding a hierarchical, hyperparameter-free density representation with automatic component counting. The approach is demonstrated to be competitive with or superior to Gaussian mixtures and KDE-based methods across synthetic and real data, and extends naturally to image segmentation and Naive Bayes classification with robust performance and minimal parameter tuning. The results highlight practical benefits for flexible density estimation and clustering without fixed component numbers, with potential extensions to multidimensional data via projections or recursive splitting into interpretable decision trees.

Abstract

Unimodality constitutes a key property indicating grouping behavior of the data around a single mode of its density. We propose a method that partitions univariate data into unimodal subsets through recursive splitting around valley points of the data density. For valley point detection, we introduce properties of critical points on the convex hull of the empirical cumulative density function (ecdf) plot that provide indications on the existence of density valleys. Next, we apply a unimodal data modeling approach that provides a statistical model for each obtained unimodal subset in the form of a Uniform Mixture Model (UMM). Consequently, a hierarchical statistical model of the initial dataset is obtained in the form of a mixture of UMMs, named as the Unimodal Mixture Model (UDMM). The proposed method is non-parametric, hyperparameter-free, automatically estimates the number of unimodal subsets and provides accurate statistical models as indicated by experimental results on clustering and density estimation tasks.

Paper Structure

This paper contains 19 sections, 3 equations, 16 figures, 6 tables, 2 algorithms.

Figures (16)

  • Figure 1: Histogram: gcm ($AB$ part) and lcm ($CD$ part) correspond to increasing and decreasing parts, respectively. Ecdf: $AB$, $BC$ and $CD$ correspond to the convex, intermediate and concave part, respectively.
  • Figure 2: Histogram and ecdf of a bimodal dataset. The non-uniform and unimodal $X(x_A,x_B)$ indicates a density valley between $A$ and $B$. $MD$ is a point close to the valley. $vp$ is the valley point.
  • Figure 3: Histogram and ecdf plot of a multimodal dataset with its best splitting intervals, processed recursively until a non-uniform and unimodal interval containing a single valley point is detected.
  • Figure 5: (a) Bimodal dataset with two computed valley points by UniSplit. (b) Omitting $vp_1$ leads to a multimodal set $X_1 \cup X_2$, thus $vp_1$ is necessary. (c) Merging $X_2$ and $X_3$ (omitting $vp_2$) leads to a unimodal set, thus $vp_2$ can be deleted. $vp_1$ is the final valley point.
  • Figure 6: Examples of statistical model fitting results on several datasets using GMM and UDMM.
  • ...and 11 more figures