A novel characterization of structures in smooth regression curves: from a viewpoint of persistent homology
Satish Kumar, Subhra Sankar Dhar
TL;DR
This work introduces a topology-based framework to infer local geometric features of smooth regression curves by analyzing the persistent homology of the super-level sets of the first derivative $m_1$. The authors develop a Tdaconsistency-based procedure to obtain a robust estimator $\,\widehat{PH}^{\\epsilon}_k(m_1)$ for the persistent homology of $m_1$, using a kernel-estimated derivative $\,\hat{m}_{1,n}$ derived from the Nadaraya–Watson estimator. A central result provides a high-probability consistency bound, $\,\delta_B( \,\widehat{PH}^{\\epsilon}_k(m_1), PH_k(m_1) ) \le 5\epsilon$, under $h(n)\to 0$ and $n h^6 \to \infty$, along with a principled approach to assess the statistical significance of observed local structures via a bottleneck-distance-based confidence set. The methodology is illustrated through simulations on monotonicity, convexity, and modality, and real-data analyses (cars and motorcycle datasets) that are compared with SiZer, highlighting the practical impact of topology-based shape inference in regression contexts. Extensions to higher derivatives and multi-regressor settings are discussed as future directions.
Abstract
We characterize structures such as monotonicity, convexity, and modality in smooth regression curves using persistent homology. Persistent homology is a key tool in topological data analysis that detects higher-dimensional topological features such as connected components and holes (cycles or loops) in the data. In other words, persistent homology is a multiscale version of homology that characterizes sets based on the connected components and holes. We use super-level sets of functions to extract geometric features via persistent homology. In particular, we explore structures in regression curves via the persistent homology of super-level sets of a function, where the function of interest is - the first derivative of the regression function. In the course of this study, we extend an existing procedure of estimating the persistent homology for the first derivative of a regression function and establish its consistency. Moreover, as an application of the proposed methodology, we demonstrate that the persistent homology of the derivative of a function can reveal hidden structures in the function that are not visible from the persistent homology of the function itself. In particular, we characterize structures such as monotonicity, convexity, and modality, and propose a measure of statistical significance to infer these structures in practice. Finally, we conduct an empirical study to implement the proposed methodology on simulated and real data sets and compare the derived results with an existing methodology.
