Degrees-of-freedom penalized piecewise regression

Stefan Volz; Martin Storath; Andreas Weinmann

Degrees-of-freedom penalized piecewise regression

Stefan Volz, Martin Storath, Andreas Weinmann

TL;DR

The paper introduces degrees-of-freedom penalized piecewise regression (DofPPR), a framework that penalizes the sum of per-segment degrees of freedom rather than simply counting segments, enabling heterogeneous segment models (e.g., mixed-degree polynomials). It establishes almost-sure uniqueness of the discrete minimizer in least-squares settings (excluding interpolating parts) and develops a fast algorithm to compute the entire regularization path with exact hyperparameter selection via rolling cross-validation and the one-standard-error rule. The authors provide a complete implementation (Rust core with Python bindings) and demonstrate improved performance on simulated data and real TCPD changepoint benchmark tasks, including state-of-the-art results under constrained DOF budgets. The approach supports optional domain knowledge and yields an interpretable, automatable model selection workflow suitable for exploratory data analysis and changepoint detection, with theoretical guarantees and scalable computation. The work contributes to the changepoint and piecewise regression literature by enabling flexible, data-adaptive modeling across segments while offering practical, exact hyperparameter tuning and robust performance metrics.

Abstract

Many popular piecewise regression models rely on minimizing a cost function on the model fit with a linear penalty on the number of segments. However, this penalty does not take into account varying complexities of the model functions on the segments potentially leading to overfitting when models with varying complexities, such as polynomials of different degrees, are used. In this work, we enhance on this approach by instead using a penalty on the sum of the degrees of freedom over all segments, called degrees-of-freedom penalized piecewise regression (DofPPR). We show that the solutions of the resulting minimization problem are unique for almost all input data in a least squares setting. We develop a fast algorithm which does not only compute a minimizer but also determines an optimal hyperparameter -- in the sense of rolling cross validation with the one standard error rule -- exactly. This eliminates manual hyperparameter selection. Our method supports optional user parameters for incorporating domain knowledge. We provide an open-source Python/Rust code for the piecewise polynomial least squares case which can be extended to further models. We demonstrate the practical utility through a simulation study and by applications to real data. A constrained variant of the proposed method gives state-of-the-art results in the Turing benchmark for unsupervised changepoint detection.

Degrees-of-freedom penalized piecewise regression

TL;DR

Abstract

Paper Structure (41 sections, 12 theorems, 37 equations, 19 figures, 2 tables, 1 algorithm)

This paper contains 41 sections, 12 theorems, 37 equations, 19 figures, 2 tables, 1 algorithm.

Main Text
Introduction
Proposed method and contributions
Prior and related work
Preliminaries and notation
Organization of the paper
Uniqueness results for degrees-of-freedom penalized piecewise least squares regression
Fast algorithm for computing the regularization paths and model selection
Computing the regularization paths
Obtaining minimizers by backtracking and resolving ambiguities
Parameter selection by rolling cross-validation with OSE-rule
Complete algorithm and analysis of the computational complexity
Implementation, simulation study and applications
Implementation and experimental setup
Results on simulated data
...and 26 more sections

Key Result

Lemma 2

Let $P, Q$ with $P \neq Q$ be two partitions, $\lambda = ({{\color{black}\lambda_{I}}})_{I\in P},$$\mu = (\mu_J)_{J \in Q}.$ Then either (i) the set or (ii)

Figures (19)

Figure 1: Comparison of the partition penalized model and the degrees-of-freedom penalized model for piecewise polynomial least squares regression: A realization (bottom left) consists of the piecewise polynomial ground truth signal (top left) sampled at 100 randomly (uniformly in $[0,1]$) chosen data sites, corrupted by i.i.d. Gaussian noise with standard deviation $\sigma = 0.01.$ The top central, top right and bottom central tiles show the results of the partition penalized model \ref{['main:eq:partition_penalized_model']} with constant, linear and quadratic least squares polynomial fit, respectively. The solid lines represent the estimate of the depicted sample realization, and the shadings depict the $0.025$ to $0.975$ quantiles over 100 realizations. The last tile shows the result of the degrees of freedom penalized model (Equations \ref{['main:eq:proposed_model_intro']}, \ref{['main:eq:proposed_model_intro_poly']}), with least squares polynomial fit up to degree two. The taken breakpoints are depicted as histogram below along with the true breakpoints (dashed). The penalty $\gamma = 0.001$ was chosen for the DofPPR manually by visual inspection. The parameters for the other model were chosen such that the penalties on the segments match the respective penalty of the DofPPR model for constant, linear and quadratic polynomials, so $\gamma = 0.001, 0.002, 0.003$ for top central, top right, and bottom central, respectively.
Figure 2: Illustration of notation used in this paper: A timeseries $(t_i, y_i),$ with $i=1,...,30,$ is represented by orange dots. $I = 1:27$ represents the first $27$ indices and $y_I$ the corresponding part of the signal. The plot shows five different piecewise polynomial functions for the data on $I$ which all have exactly six degrees of freedom (dofs). We first look at the brown line. Its ordered partition is given by $P = (1:8,~9:21,~22:27),$ and the associated dof sequence $\lambda_{\text{brown}}$ is given by $(2, 3, 1);$ its entries sum up to $\nu = 2+ 3 + 1 = 6.$ The brown line represents the least squares fit to the data on the three segments using polynomial functions with the indicated dofs, so linear, quadratic, and constant. The cyan and the red candidate have the same partition, the same degrees of freedom ($\nu = 6$), but the different dof sequences $\lambda_{\text{cyan}} = (2, 2, 2)$ and $\lambda_{\text{red}} = (1, 2, 3),$ respectively. (In Section \ref{['main:sec:fast_algorithm']} we use the notation $\Lambda^6(P)$ for all valid dof sequences for the partition $P$ that sum up to $6$.) The green and blue candidates have different partitions, namely $(1:16,17:21,22:27)$ and $(1:21,22:27),$ and the dof sequences $\lambda_{\text{green}} = (1, 1, 4),$ and $\lambda_{\text{blue}} = (1, 5),$ respectively. The shown candidates share the last segment but "spend" between one and five dofs on that segment, so that only five to one dofs remain for the remaining data with indices 1 to 21. A crucial part of the proposed method is that it computes the solution of the model \ref{['main:eq:proposed_model_intro_poly']} for all regularization parameters $\gamma \geq 0$ and for all "active" segments $I = 1:r,$ with $r =1, \ldots, n,$ efficiently.
Figure 3: Computing the regularization paths of \ref{['main:eq:proposed_model_intro']} for all possible values of $\gamma$ amounts to finding the pointwise minimum of a collection of affine linear functions and in particular their critical values. The $\gamma$-parameters between two critical values ($S$ in the graphic) provide the same minimizers of \ref{['main:eq:proposed_model_intro']}. Hence, the mapping from the set of hyperparameters to the optimal solution $\gamma \mapsto (P^*_\gamma, \lambda^*_\gamma)$ is a piecewise constant function.
Figure 4: A visualization of the dynamic programm for solving the $\nu$-degree of freedom partition problem \ref{['main:eq:degree_of_freedom_partition_problem']}. Candidates for the optimal solution on data $1:r+1$ with $\nu$ degrees of freedom are composed of the best (continuous) approximation on data $l+1:r+1$ with $p_R$ degrees of freedom and the best (piecewise) solution on data $1:l$ with $\nu - p_R$ degrees of freedom.
Figure 5: Visualization of the proposed parameter selection. Left: The piecewise constant CV scoring function for the sample realization of Figure \ref{['main:fig:comparison_pcw_DofPPR']} (solid line) and the $2.5 \%$ to $97.5 \%$ quantiles over all realizations (shaded). Middle and right: The corresponding results for the parameter choices $\gamma_{\text{CV}}$ and $\gamma_{\text{OSE}},$ respectively. The legend is identical to that of Figure \ref{['main:fig:comparison_pcw_DofPPR']}. The histograms of the breakpoints indicate a slightly better localization of the breakpoints near the true breaks.
...and 14 more figures

Theorems & Definitions (27)

Example 1
Lemma 2
Theorem 3
Example 4
Lemma 5
Remark 1
Lemma 6
Theorem 7
Corollary 8
Corollary 9
...and 17 more

Degrees-of-freedom penalized piecewise regression

TL;DR

Abstract

Degrees-of-freedom penalized piecewise regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (27)