An Algorithm for Optimal Partitioning of Data on an Interval
Brad Jackson, Jeffrey D. Scargle, David Barnes, Sundararajan Arabhi, Alina Alt, Peter Gioumousis, Elyus Gwin, Paungkaew Sangtrakulcharoen, Linda Tan, Tun Tao Tsai
TL;DR
The paper reframes many signal-processing tasks as finding an optimal partition of an interval using an additive block fitness $V({\bf P})=\sum_m g(B_m)$ over discretized data cells, and proves an $O(N^2)$ dynamic-programming algorithm that yields the exact global optimum and automatically determines the number of segments. By exploiting the principle of optimality, the method computes $\text{opt}(n+1)=\max_j\{\text{opt}(j-1)+\text{end}(j,n+1)\}$ with $\text{end}(j,n+1)=g(B_{j,n+1})$, then backtracks using $\text{lastchange}$ to recover block boundaries. The approach applies to a wide range of 1D segmented models (e.g., piecewise-constant Poisson histograms, density estimation, signal detection) and extends to higher dimensions, offering real-time change-point detection and automatic model-order selection without explicit smoothing. Its real-time capability, flexibility, and exact optimality make it a principled tool for detection, segmentation, and clustering tasks across signal-processing and data-mining applications.
Abstract
Many signal processing problems can be solved by maximizing the fitness of a segmented model over all possible partitions of the data interval. This letter describes a simple but powerful algorithm that searches the exponentially large space of partitions of $N$ data points in time $O(N^2)$. The algorithm is guaranteed to find the exact global optimum, automatically determines the model order (the number of segments), has a convenient real-time mode, can be extended to higher dimensional data spaces, and solves a surprising variety of problems in signal detection and characterization, density estimation, cluster analysis and classification.
