On the Impact of Sample Size in Reconstructing Noisy Graph Signals: A Theoretical Characterisation
Baskaran Sripathmanathan, Xiaowen Dong, Michael Bronstein
TL;DR
This work analyzes how the number of observed vertices (sample size) impacts the reconstruction error when recovering a graph signal from noisy observations. Using a bias-variance decomposition, it shows that under LS and GLR, high noise can make smaller sample sets yield lower MSE, and derives thresholds (e.g., $\tau(\mathcal{S},\mathcal{T})$, $\tau_{GLR}$) that govern when reducing samples helps. The paper provides both general theorems and method-specific results for full-band and $k$-bandlimited noise, including asymptotic insights as $N \to \infty$ and a data-driven notion of an optimal sample size $m_{opt}$. Extensive experiments on ER, BA, and SBM graphs validate the nonmonotonic MSE behavior and illustrate practical guidance for sample-size decisions in noisy graph-signal reconstruction. Overall, the findings highlight the crucial role of noise levels in sampling design and challenge the assumption that more samples always improve reconstruction in graph domains.
Abstract
Reconstructing a signal on a graph from noisy observations of a subset of the vertices is a fundamental problem in the field of graph signal processing. This paper investigates how sample size affects reconstruction error in the presence of noise via an in-depth theoretical analysis of the two most common reconstruction methods in the literature, least-squares reconstruction (LS) and graph-Laplacian regularised reconstruction (GLR). Our theorems show that at sufficiently low signal-to-noise ratios (SNRs), under these reconstruction methods we may simultaneously decrease sample size and decrease average reconstruction error. We further show that at sufficiently low SNRs, for LS reconstruction we have a $Λ$-shaped error curve and for GLR reconstruction, a sample size of $ \mathcal{O}(\sqrt{N})$, where $N$ is the total number of vertices, results in lower reconstruction error than near full observation. We present thresholds on the SNRs, $τ$ and $τ_{GLR}$, below which the error is non-monotonic, and illustrate these theoretical results with experiments across multiple random graph models, sampling schemes and SNRs. These results demonstrate that any decision in sample-size choice has to be made in light of the noise levels in the data.
