Table of Contents
Fetching ...

Improved identification of breakpoints in piecewise regression and its applications

Taehyeong Kim, Hyungu Lee, Hayoung Choi

TL;DR

This work tackles breakpoints in continuous piecewise polynomial regression by introducing a greedy algorithm that iteratively updates breakpoint locations within a finite candidate set to minimize $MSE$. For a fixed number of breakpoints $k$, each breakpoint is tested at three local positions, solving small constrained least-squares problems to select the best position, enabling fast and parallelizable updates. To determine the optimal number of breakpoints, the method starts with a superset and progressively removes the least impactful breakpoint based on $MSE$-based criteria controlled by a tolerance $\tau$ and a cap $p$. Across synthetic and real datasets, the proposed approach demonstrates superior accuracy (higher $R^2$, lower $MSE$ and $RAE$) and a parsimonious breakpoint count compared with baselines such as PR, SR, SVR, DT, GB, RF, $\ell_1$ trend filter, APLR, and PELT. These results suggest practical benefits for reliable data fitting and interpretability in applications requiring adaptive segmentation, with potential extensions to reinforcement-learning-based breakpoint detection.

Abstract

Identifying breakpoints in piecewise regression is critical in enhancing the reliability and interpretability of data fitting. In this paper, we propose novel algorithms based on the greedy algorithm to accurately and efficiently identify breakpoints in piecewise polynomial regression. The algorithm updates the breakpoints to minimize the error by exploring the neighborhood of each breakpoint. It has a fast convergence rate and stability to find optimal breakpoints. Moreover, it can determine the optimal number of breakpoints. The computational results for real and synthetic data show that its accuracy is better than any existing methods. The real-world datasets demonstrate that breakpoints through the proposed algorithm provide valuable data information.

Improved identification of breakpoints in piecewise regression and its applications

TL;DR

This work tackles breakpoints in continuous piecewise polynomial regression by introducing a greedy algorithm that iteratively updates breakpoint locations within a finite candidate set to minimize . For a fixed number of breakpoints , each breakpoint is tested at three local positions, solving small constrained least-squares problems to select the best position, enabling fast and parallelizable updates. To determine the optimal number of breakpoints, the method starts with a superset and progressively removes the least impactful breakpoint based on -based criteria controlled by a tolerance and a cap . Across synthetic and real datasets, the proposed approach demonstrates superior accuracy (higher , lower and ) and a parsimonious breakpoint count compared with baselines such as PR, SR, SVR, DT, GB, RF, trend filter, APLR, and PELT. These results suggest practical benefits for reliable data fitting and interpretability in applications requiring adaptive segmentation, with potential extensions to reinforcement-learning-based breakpoint detection.

Abstract

Identifying breakpoints in piecewise regression is critical in enhancing the reliability and interpretability of data fitting. In this paper, we propose novel algorithms based on the greedy algorithm to accurately and efficiently identify breakpoints in piecewise polynomial regression. The algorithm updates the breakpoints to minimize the error by exploring the neighborhood of each breakpoint. It has a fast convergence rate and stability to find optimal breakpoints. Moreover, it can determine the optimal number of breakpoints. The computational results for real and synthetic data show that its accuracy is better than any existing methods. The real-world datasets demonstrate that breakpoints through the proposed algorithm provide valuable data information.
Paper Structure (9 sections, 38 equations, 6 figures, 3 tables, 4 algorithms)

This paper contains 9 sections, 38 equations, 6 figures, 3 tables, 4 algorithms.

Figures (6)

  • Figure 1: Update breakpoints among $\xi_{j}^{-},\xi_{j},\xi_{j}^{+}$.
  • Figure 2: Comparing MSE as the number of breakpoints is reduced from $7$ to $4$.
  • Figure 3: Algorithm \ref{['alg:optimal_BP_algorithm']} can avoid local minimum well.
  • Figure 4: Result of each regression models.
  • Figure 5: Comparison of $\ell_1$ trend filter, APLR, PELT and proposed method for S&P 500 data.
  • ...and 1 more figures