Momentum LMS Theory beyond Stationarity: Stability, Tracking, and Regret

Yifei Jin; Xin Zheng; Lei Guo

Momentum LMS Theory beyond Stationarity: Stability, Tracking, and Regret

Yifei Jin, Xin Zheng, Lei Guo

TL;DR

This paper tackles adaptive tracking under nonstationary streaming data by analyzing Momentum LMS (MLMS), which augments LMS with a Polyak momentum term to improve real-time adaptation. It develops a unified stability framework via state augmentation, deriving $L_p$-stability bounds and tracking guarantees under both bounded and zero-mean random disturbances, while also providing prediction guarantees without excitation conditions. The theoretical results hinge on a generalized excitation assumption that extends beyond i.i.d. data, enabling analysis of stochastic, feedback-driven systems. Empirical evaluations on synthetic jumping-parameter data and a real-world speech enhancement task corroborate the theory, showing MLMS achieves faster adaptation and superior tracking and denoising performance in nonstationary environments. Overall, the work establishes MLMS as a principled, scalable tool for online learning in time-varying settings with broad applicability to streaming and nonlinear dynamics.

Abstract

In large-scale data processing scenarios, data often arrive in sequential streams generated by complex systems that exhibit drifting distributions and time-varying system parameters. This nonstationarity challenges theoretical analysis, as it violates classical assumptions of i.i.d. (independent and identically distributed) samples, necessitating algorithms capable of real-time updates without expensive retraining. An effective approach should process each sample in a single pass, while maintaining computational and memory complexities independent of the data stream length. Motivated by these challenges, this paper investigates the Momentum Least Mean Squares (MLMS) algorithm as an adaptive identification tool, leveraging its computational simplicity and online processing capabilities. Theoretically, we derive tracking performance and regret bounds for the MLMS in time-varying stochastic linear systems under various practical conditions. Unlike classical LMS, whose stability can be characterized by first-order random vector difference equations, MLMS introduces an additional dynamical state due to momentum, leading to second-order time-varying random vector difference equations whose stability analysis hinges on more complicated products of random matrices, which poses a substantially challenging problem to resolve. Experiments on synthetic and real-world data streams demonstrate that MLMS achieves rapid adaptation and robust tracking, in agreement with our theoretical results especially in nonstationary settings, highlighting its promise for modern streaming and online learning applications.

Momentum LMS Theory beyond Stationarity: Stability, Tracking, and Regret

TL;DR

-stability bounds and tracking guarantees under both bounded and zero-mean random disturbances, while also providing prediction guarantees without excitation conditions. The theoretical results hinge on a generalized excitation assumption that extends beyond i.i.d. data, enabling analysis of stochastic, feedback-driven systems. Empirical evaluations on synthetic jumping-parameter data and a real-world speech enhancement task corroborate the theory, showing MLMS achieves faster adaptation and superior tracking and denoising performance in nonstationary environments. Overall, the work establishes MLMS as a principled, scalable tool for online learning in time-varying settings with broad applicability to streaming and nonlinear dynamics.

Abstract

Paper Structure (12 sections, 30 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 12 sections, 30 equations, 3 figures, 1 table, 1 algorithm.

Introduction
Problem formulation
Notations and assumptions
The MLMS Algorithm
Main Results
The Case of Bounded Noises and Parameter Variations
The Case of Zero Mean Random Noises and Parameter Variations
Prediction Analysis
Experiments
Synthetic Data with Jumping Parameters
Application to Speech Enhancement
Conclusion

Figures (3)

Figure 1: Trajectories of SGD and SGD-Momentum on the Rosenbrock function. The green star marks the global optimum $(1,1)$. Starting from $(-1.5,\,2.5)$ with the same step-size, SGD (black) zigzags across the curved, ill-conditioned valley, while SGD with momentum (red) builds a velocity that damps high-curvature oscillations and accelerates motion along the valley floor, producing a smoother path, fewer iterations, and a lower final objective value.
Figure 2: Tracking MSE (in dB) over 10 independent trials for the time-varying regression task.
Figure 3: Adaptive Noise Cancellation

Momentum LMS Theory beyond Stationarity: Stability, Tracking, and Regret

TL;DR

Abstract

Momentum LMS Theory beyond Stationarity: Stability, Tracking, and Regret

Authors

TL;DR

Abstract

Table of Contents

Figures (3)