Bayesian taut splines for estimating the number of modes
José E. Chacón, Javier Fernández Serrano
TL;DR
This paper tackles the challenge of estimating the number of modes in univariate densities by introducing Bayesian taut splines (BTS), a framework that blends kernel density estimation with compositional splines under Bayes spaces to yield structured, probabilistic modality inferences. BTS proceeds through exploration, analysis, selection, and testing, enabling soft, data-driven decisions about $k$ while incorporating expert judgment. A one-parameter sfpca-based model summarizes the PDF ensemble, and excess-mass–based testing via Savage-Dickey Bayes factors provides local significance of each mode, producing a holistic view that marries global model-structure with local evidence. The approach is demonstrated on Hidalgo stamp data and MLB pitching speeds, with a comprehensive simulation study showing BTS often outperforms traditional modality methods and yields interpretable intermediate results such as mode trees and posterior medians. These contributions offer a practical, interpretable, and robust framework for modality assessment with potential applicability to bounded data pdf estimation and exploratory data analysis.
Abstract
The number of modes in a probability density function is representative of the complexity of a model and can also be viewed as the number of subpopulations. Despite its relevance, there has been limited research in this area. A novel approach to estimating the number of modes in the univariate setting is presented, focusing on prediction accuracy and inspired by some overlooked aspects of the problem: the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view that blends local and global density properties. The technique combines flexible kernel estimators and parsimonious compositional splines in the Bayesian inference paradigm, providing soft solutions and incorporating expert judgment. The procedure includes feature exploration, model selection, and mode testing, illustrated in a sports analytics case study showcasing multiple companion visualisation tools. A thorough simulation study also demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, the new method emerges as a top-tier alternative, offering innovative solutions for analysts.
