Table of Contents
Fetching ...

Variance estimation in graphs with the fused lasso

Oscar Hernan Madrid Padilla

TL;DR

The paper addresses variance estimation in graph-structured regression models with data $y_i = \theta_i^* + (v_i^*)^{1/2} \varepsilon_i$ on a connected graph $G$, and analyzes mean-variance estimation via graph-based fusion penalties. It introduces a linear-time DFS-based estimator for the homoscedastic variance, derives minimax-optimal rates on 1D chains and 2D grids under canonical total-variation scaling, and extends fused-lasso risk bounds to broad error distributions beyond sub-Gaussian. It then develops a heteroscedastic variance estimator with rates matching those of the mean estimator, provides general lower bounds showing minimax optimality on grid and $K$-NN graphs, and establishes consistency on any connected graph. The authors validate the approach through simulations on grids and $K$-NN graphs and a real ion-channel dataset, demonstrating robust performance under heteroscedasticity and non-sub-Gaussian noise. Overall, the work extends fused lasso theory to broader error distributions, enables variance estimation on general graphs, and shows practical effectiveness for graph-structured data analysis.

Abstract

We study the problem of variance estimation in general graph-structured problems. First, we develop a linear time estimator for the homoscedastic case that can consistently estimate the variance in general graphs. We show that our estimator attains minimax rates for the chain and 2D grid graphs when the mean signal has total variation with canonical scaling. Furthermore, we provide general upper bounds on the mean squared error performance of the fused lasso estimator in general graphs under a moment condition and a bound on the tail behavior of the errors. These upper bounds allow us to generalize for broader classes of distributions, such as sub-exponential, many existing results on the fused lasso that are only known to hold with the assumption that errors are sub-Gaussian random variables. Exploiting our upper bounds, we then study a simple total variation regularization estimator for estimating the signal of variances in the heteroscedastic case. We also provide lower bounds showing that our heteroscedastic variance estimator attains minimax rates for estimating signals of bounded variation in grid graphs, and $K$-nearest neighbor graphs, and the estimator is consistent for estimating the variances in any connected graph.

Variance estimation in graphs with the fused lasso

TL;DR

The paper addresses variance estimation in graph-structured regression models with data on a connected graph , and analyzes mean-variance estimation via graph-based fusion penalties. It introduces a linear-time DFS-based estimator for the homoscedastic variance, derives minimax-optimal rates on 1D chains and 2D grids under canonical total-variation scaling, and extends fused-lasso risk bounds to broad error distributions beyond sub-Gaussian. It then develops a heteroscedastic variance estimator with rates matching those of the mean estimator, provides general lower bounds showing minimax optimality on grid and -NN graphs, and establishes consistency on any connected graph. The authors validate the approach through simulations on grids and -NN graphs and a real ion-channel dataset, demonstrating robust performance under heteroscedasticity and non-sub-Gaussian noise. Overall, the work extends fused lasso theory to broader error distributions, enables variance estimation on general graphs, and shows practical effectiveness for graph-structured data analysis.

Abstract

We study the problem of variance estimation in general graph-structured problems. First, we develop a linear time estimator for the homoscedastic case that can consistently estimate the variance in general graphs. We show that our estimator attains minimax rates for the chain and 2D grid graphs when the mean signal has total variation with canonical scaling. Furthermore, we provide general upper bounds on the mean squared error performance of the fused lasso estimator in general graphs under a moment condition and a bound on the tail behavior of the errors. These upper bounds allow us to generalize for broader classes of distributions, such as sub-exponential, many existing results on the fused lasso that are only known to hold with the assumption that errors are sub-Gaussian random variables. Exploiting our upper bounds, we then study a simple total variation regularization estimator for estimating the signal of variances in the heteroscedastic case. We also provide lower bounds showing that our heteroscedastic variance estimator attains minimax rates for estimating signals of bounded variation in grid graphs, and -nearest neighbor graphs, and the estimator is consistent for estimating the variances in any connected graph.
Paper Structure (30 sections, 8 theorems, 132 equations, 5 figures, 4 tables)

This paper contains 30 sections, 8 theorems, 132 equations, 5 figures, 4 tables.

Key Result

Theorem 1

Suppose that Assumption as2 holds and $\|\epsilon\|_{\infty} =O_{\mathrm{pr}}(U_n)$ for some positive sequence $U_n$. Then

Figures (5)

  • Figure 1: An example of a graph $G$. Running DFS starting with the node $1$ produces the ordering $1,3,6,5,8,9,4,7,11,2, 10$.
  • Figure 2: The left panel shows comparisons of the true and estimated means for Example \ref{['ex1']} in the text. The right panel shows the corresponding variance comparisons.
  • Figure 3: Each row corresponds to one scenario, with the top row corresponding to Scenario 4, the middle to Scenario 5, and the bottom to Scenario 6. The left column depicts the signals $\theta^*$, the middle column the signals $v^*$, and the right column the estimated $\hat{v}$ with our method in (\ref{['eqn:def']})--(\ref{['eqn:estimator2']}).
  • Figure 4: . For $n = 20000$ and $d=2$, the top left panel shows a scatter plot of $\{(x_{i,1},x_{i,2}, v_i^*)\}_{i=1}^n$ for one instance of Scenarios 7 and 8. The top right panel displays the corresponding scatter plot of $\{(x_{i,1},x_{i,2}, \hat{v}_i)\}_{i=1}^n$ for Scenario 7. The bottom left panel is the scatter plot of $\{(x_{i,1},x_{i,2}, f_0(x_i))\}_{i=1}^n$ for Scenario 8, and the bottom right panel shows the scatter plot of $\{(x_{i,1},x_{i,2}, \hat{v}_i)\}_{i=1}^n$ for Scenario 8. Here, $\hat{v}$ is our Het. estimator defined in (\ref{['eqn:def']})--(\ref{['eqn:estimator2']}) with the $K$-NN graph.
  • Figure 5: Ion channels data and estimated variances

Theorems & Definitions (24)

  • Theorem 1
  • Remark 1
  • Remark 2
  • Example 1
  • Theorem 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Theorem 3
  • Remark 6
  • ...and 14 more