Minmax Trend Filtering: Generalizations of Total Variation Denoising via a Local Minmax/Maxmin Formula

Sabyasachi Chatterjee

Minmax Trend Filtering: Generalizations of Total Variation Denoising via a Local Minmax/Maxmin Formula

Sabyasachi Chatterjee

TL;DR

Minmax Trend Filtering ($MTF$) generalizes univariate Total Variation Denoising ($TVD$) by expressing the fitted value at each point as a pointwise minmax/maxmin of penalized local averages, enabling a local bias-variance interpretation. The framework extends to higher-degree local polynomials ($MTF$ of degree $r$) and to kernel-smoothing variants, with a unified pointwise error bound that holds across all locations and scales. The authors establish local rates under Hölder smoothness, global minimax-optimality over BV and piecewise-polynomial classes, and fast rates for piecewise-polynomial signals, along with boundary-consistent behavior. Computationally, a practical variant (DSMTF) runs in $O(n^2)$ time and experiments show superior performance to Trend Filtering on signals with heterogeneous local smoothness, highlighting the approach's local adaptivity and robustness to oversmoothing. This work provides a versatile, theoretically transparent approach to locally adaptive nonparametric regression with potential multivariate extensions and alternative loss functions.

Abstract

Total Variation Denoising (TVD) is a fundamental denoising and smoothing method. In this article, we identify a new local minmax/maxmin formula producing two estimators which sandwich the univariate TVD estimator at every point. Operationally, this formula gives a local definition of TVD as a minmax/maxmin of a simple function of local averages. Moreover we find that this minmax/maxmin formula is generalizeable and can be used to define other TVD like estimators. In this article we propose and study higher order polynomial versions of TVD which are defined pointwise lying between minmax and maxmin optimizations of penalized local polynomial regressions over intervals of different scales. These appear to be new nonparametric regression methods, different from usual Trend Filtering and any other existing method in the nonparametric regression toolbox. We call these estimators Minmax Trend Filtering (MTF). We show how the proposed local definition of TVD/MTF estimator makes it tractable to bound pointwise estimation errors in terms of a local bias variance like trade-off. This type of local analysis of TVD/MTF is new and arguably simpler than existing analyses of TVD/Trend Filtering. In particular, apart from minimax rate optimality over bounded variation and piecewise polynomial classes, our pointwise estimation error bounds also enable us to derive local rates of convergence for (locally) Holder Smooth signals. These local rates offer a new pointwise explanation of local adaptivity of TVD/MTF instead of global (MSE) based justifications.

Minmax Trend Filtering: Generalizations of Total Variation Denoising via a Local Minmax/Maxmin Formula

TL;DR

Minmax Trend Filtering (

) generalizes univariate Total Variation Denoising (

) by expressing the fitted value at each point as a pointwise minmax/maxmin of penalized local averages, enabling a local bias-variance interpretation. The framework extends to higher-degree local polynomials (

of degree

) and to kernel-smoothing variants, with a unified pointwise error bound that holds across all locations and scales. The authors establish local rates under Hölder smoothness, global minimax-optimality over BV and piecewise-polynomial classes, and fast rates for piecewise-polynomial signals, along with boundary-consistent behavior. Computationally, a practical variant (DSMTF) runs in

time and experiments show superior performance to Trend Filtering on signals with heterogeneous local smoothness, highlighting the approach's local adaptivity and robustness to oversmoothing. This work provides a versatile, theoretically transparent approach to locally adaptive nonparametric regression with potential multivariate extensions and alternative loss functions.

Abstract

Paper Structure (35 sections, 23 theorems, 195 equations, 7 figures)

This paper contains 35 sections, 23 theorems, 195 equations, 7 figures.

Introduction
Nonparametric Regression and Local Adaptivity
Existing Notions of Local Adaptivity
Motivating the Study of Pointwise Estimation Errors
Main Contributions of this Article
Notations
Outline
Total Variation Denoising/Fused Lasso
A Pointwise Formula for the TVD Estimator
Comments on the Proof
The Minmax/Maxmin Principle and its Well Posedness
Minmax Trend Filtering of General Degree
Definition of Minmax Trend Filtering
Pointwise Estimation Error Bound for Minmax Trend Filtering
Local Rates
...and 20 more sections

Key Result

Theorem 2.1

[A Pointwise Formula for TVD/Fused Lasso] Fix any $i \in [n].$ The following pointwise bound holds for the TVD estimator $\hat{\theta}^{(\lambda)}$ defined in eq:tvd: where Moreover, the above pointwise bounds can be improved at the boundary points. Specifically, the following holds for the first and last point of Fused Lasso: where

Figures (7)

Figure 1: In the right panel, we show the twohalves function risk as a function of $\lambda$ for TVD (in blue) and KS (in red). The dashed lines are standard $95$ percent confidence intervals for the estimated RMSE curve. In the left panel, we show one realization of the two fits at $\lambda = 4$ (near optimal in this instance).
Figure 2: We compare risk curves of TVD and Kernel Smoothing as a function of the tuning parameter $\lambda$ when the underlying signal is the Blocks (topleft), Bumps(topright), Heavisine (bottomleft) and Doppler (bottomright) function respectively. In all of these risk curves, we see what is predicted by our local bound in Section \ref{['sec:local']}; the risk curve of TVD worsens far more gracefully with oversmoothing as compared to Kernel Smoothing.
Figure 3: The Blocks function. We have used DSMTF with $r = 1$ and $1$-st order Trend Filtering.
Figure 4: The Bumps function. We have used DSMTF with $r = 1$ and $1$-st order Trend Filtering.
Figure 5: The HeaviSine function. We have used DSMTF with $r = 2$ and $2$-nd order Trend Filtering.
...and 2 more figures

Theorems & Definitions (59)

Theorem 2.1
Proposition 3.1: Well Posedness
Remark 3.1
proof : Proof of Proposition \ref{['prop:welldefn']}
Definition 4.1
Theorem 5.1
Definition 6.1: Hölder space for Functions
Definition 6.2: Hölder space for Sequences
Theorem 6.3: Local Adaptivity Result
Lemma 6.4
...and 49 more

Minmax Trend Filtering: Generalizations of Total Variation Denoising via a Local Minmax/Maxmin Formula

TL;DR

Abstract

Minmax Trend Filtering: Generalizations of Total Variation Denoising via a Local Minmax/Maxmin Formula

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (59)