Table of Contents
Fetching ...

Multiscale Quantile Regression with Local Error Control

Zhi Liu, Housen Li

TL;DR

The paper tackles robust change-point detection in time series by proposing MUSCLE, a multiscale quantile segmentation method with local error control. It combines a multiscale testing framework on segments, a variational estimator, and a wavelet-tree–based dynamic programming implementation to achieve strong finite-sample guarantees, including FDR and OER control via a single tuning parameter. The authors establish consistency and localization rates under mild signal-to-noise conditions, and demonstrate competitive performance against a wide range of methods on simulations and real data from electrophysiology and geophysics. Extensions to distributional changes (M-MUSCLE) and dependent data (D-MUSCLE) broaden applicability, while the splitting-merging variant MUSCLE-S offers scalable computation for large datasets. Overall, MUSCLE provides robust, powerful change-point detection with interpretable results and practical software availability through the muscle R package.

Abstract

For robust and efficient detection of change points, we introduce a novel methodology MUSCLE (Multiscale qUantile Segmentation Controlling Local Error) that partitions serial data into multiple segments, each sharing a common quantile. It leverages multiple tests for quantile changes over different scales and locations, and variational estimation. Unlike the often adopted global error control, MUSCLE focuses on local errors defined on individual segments, significantly improving detection power in finding change points. Meanwhile, due to the built-in model complexity penalty, it enjoys the finite sample guarantee that its false discovery rate (or the expected proportion of falsely detected change points) is upper bounded by its unique tuning parameter. Further, we obtain the consistency and the localization error rates in estimating change points, under mild signal-to-noise-ratio conditions. Both match (up to log factors) the minimax optimality results in the Gaussian setup. All theories hold under the only distributional assumption of serial independence. Incorporating the wavelet tree data structure, we develop an efficient dynamic programming algorithm for computing MUSCLE. Extensive simulations as well as real data applications in electrophysiology and geophysics demonstrate its competitiveness and effectiveness. An implementation via R package muscle is available on GitHub.

Multiscale Quantile Regression with Local Error Control

TL;DR

The paper tackles robust change-point detection in time series by proposing MUSCLE, a multiscale quantile segmentation method with local error control. It combines a multiscale testing framework on segments, a variational estimator, and a wavelet-tree–based dynamic programming implementation to achieve strong finite-sample guarantees, including FDR and OER control via a single tuning parameter. The authors establish consistency and localization rates under mild signal-to-noise conditions, and demonstrate competitive performance against a wide range of methods on simulations and real data from electrophysiology and geophysics. Extensions to distributional changes (M-MUSCLE) and dependent data (D-MUSCLE) broaden applicability, while the splitting-merging variant MUSCLE-S offers scalable computation for large datasets. Overall, MUSCLE provides robust, powerful change-point detection with interpretable results and practical software availability through the muscle R package.

Abstract

For robust and efficient detection of change points, we introduce a novel methodology MUSCLE (Multiscale qUantile Segmentation Controlling Local Error) that partitions serial data into multiple segments, each sharing a common quantile. It leverages multiple tests for quantile changes over different scales and locations, and variational estimation. Unlike the often adopted global error control, MUSCLE focuses on local errors defined on individual segments, significantly improving detection power in finding change points. Meanwhile, due to the built-in model complexity penalty, it enjoys the finite sample guarantee that its false discovery rate (or the expected proportion of falsely detected change points) is upper bounded by its unique tuning parameter. Further, we obtain the consistency and the localization error rates in estimating change points, under mild signal-to-noise-ratio conditions. Both match (up to log factors) the minimax optimality results in the Gaussian setup. All theories hold under the only distributional assumption of serial independence. Incorporating the wavelet tree data structure, we develop an efficient dynamic programming algorithm for computing MUSCLE. Extensive simulations as well as real data applications in electrophysiology and geophysics demonstrate its competitiveness and effectiveness. An implementation via R package muscle is available on GitHub.
Paper Structure (29 sections, 12 theorems, 87 equations, 11 figures, 4 tables)

This paper contains 29 sections, 12 theorems, 87 equations, 11 figures, 4 tables.

Key Result

Theorem 1

Assume Model QSR and let $\widehat{K}$ in e: hat_K be the estimated number of change-points by MUSCLE with $\alpha \in (0,1)$. Then it holds where In particular, it implies with e: LTO that

Figures (11)

  • Figure 1: Influence of data windowing on segmentation methods. The top panel shows the true signal (line) and data (crosses). The lower ones display the segmentation results using the first 80 observations (dashed line) and using the whole 400 observations (solid line) for the proposed MUSCLE, MQS jula2022multiscale, NOT-HT baranowski2019narrowest, KSD madrid2021optimal, ED-PELT haynes2017computationally, RNSP fryzlewicz2024robust and R-FPOP fearnhead2019changepoint. The vertical dashed line marks the 80th observation.
  • Figure 2: Comparison of the proposed MUSCLE and MQS jula2022multiscale. The true signal (black line), which is a teeth function, and data (gray line; $2000$ samples) are shown in the top panel. The estimates by MUSCLE ($\alpha = 0.3$), MQS ($\tilde{\alpha} = 0.3$) and MQS ($\tilde{\alpha} = 0.999$) are plotted in lower panels. The performance of each method, over $200$ repetitions, in estimating the number of change points is visualised by histograms on the right hand side. The true number of change points $K = 80$ is marked by a vertical dashed line.
  • Figure 3: Recovery of the underlying signal in \ref{['i:gauss']}. In each panel, the overall performance over 200 repetitions is summarized as a boxplot and individual repetitions are jittered in dots with a low intensity. In the bottom left panel, the theoretical upper bound $\alpha = 0.3$ on the FDR of MUSCLE is marked by a red dashed line. In the bottom right panel, the true number of change points is $K = 2$ (marked by a red dashed line).
  • Figure 4: Recovery of the blocks signal in \ref{['i:tdist']}. In each panel, the overall performance over 200 repetitions is summarized as a boxplot and individual repetitions are jittered in dots with a low intensity. In the bottom left panel, the theoretical upper bound $\alpha = 0.3$ on the FDR of MUSCLE is marked by a red dashed line. In the bottom right panel, the true numbers of change points in median and in distribution are $K = 11$ (marked by a red dashed line) and $K = 14$ (marked by a blue dashed line), respectively.
  • Figure 5: Recovery of the blocks signal in \ref{['i:hetemix']}. In each panel, the overall performance over 200 repetitions is summarized as a boxplot and individual repetitions are jittered in dots with a low intensity. In the bottom left panel, the theoretical upper bound $\alpha = 0.3$ on the FDR of MUSCLE is marked by a red dashed line. In the bottom right panel, the true numbers of change points in median and in distribution are $K = 11$ (marked by a red dashed line) and $K = 14$ (marked by a blue dashed line), respectively.
  • ...and 6 more figures

Theorems & Definitions (29)

  • Remark 1: Multiscale statistics
  • Definition 1: jula2022multiscale
  • Example 1: Distribution with continuous density
  • Theorem 1: Underestimation bound
  • Theorem 2: Overestimation bound
  • Theorem 3: Model selection consistency
  • Remark 2
  • Theorem 4: Localization rate in Hausdorff distance
  • Remark 3
  • Theorem 5: FDR and OER control
  • ...and 19 more