Table of Contents
Fetching ...

Outlier-Robust Linear System Identification Under Heavy-tailed Noise

Vinay Kanakeri, Aritra Mitra

TL;DR

The paper addresses finite-sample PAC bounds for identifying the state-transition matrix $A$ in an LTI system $x_{t+1}=Ax_t+w_t$ under heavy-tailed noise with only a finite fourth moment. It introduces Robust-SysID, a robust algorithm that buckets trajectories, computes per-bucket OLS estimates, and aggregates them via a Frobenius geometric median to achieve high-probability accuracy, preserving logarithmic dependence on the failure probability $\delta$. The scalar and vector results show near-sub-Gaussian error rates, with the scalar bound matching Gaussian rates and the vector bound incurring an $O(d)$ multiplicative factor and a dependence on $C_A$ and $C_w$; the framework also handles adversarial corruptions with a controlled increase in sample complexity. Overall, the work advances robust statistical learning for data-driven control under non-ideal noise, providing foundational PAC guarantees and practical robustness tools for system identification.

Abstract

We consider the problem of estimating the state transition matrix of a linear time-invariant (LTI) system, given access to multiple independent trajectories sampled from the system. Several recent papers have conducted a non-asymptotic analysis of this problem, relying crucially on the assumption that the process noise is either Gaussian or sub-Gaussian, i.e., "light-tailed". In sharp contrast, we work under a significantly weaker noise model, assuming nothing more than the existence of the fourth moment of the noise distribution. For this setting, we provide the first set of results demonstrating that one can obtain sample-complexity bounds for linear system identification that are nearly of the same order as under sub-Gaussian noise. To achieve such results, we develop a novel robust system identification algorithm that relies on constructing multiple weakly-concentrated estimators, and then boosting their performance using suitable tools from high-dimensional robust statistics. Interestingly, our analysis reveals how the kurtosis of the noise distribution, a measure of heavy-tailedness, affects the number of trajectories needed to achieve desired estimation error bounds. Finally, we show that our algorithm and analysis technique can be easily extended to account for scenarios where an adversary can arbitrarily corrupt a small fraction of the collected trajectory data. Our work takes the first steps towards building a robust statistical learning theory for control under non-ideal assumptions on the data-generating process.

Outlier-Robust Linear System Identification Under Heavy-tailed Noise

TL;DR

The paper addresses finite-sample PAC bounds for identifying the state-transition matrix in an LTI system under heavy-tailed noise with only a finite fourth moment. It introduces Robust-SysID, a robust algorithm that buckets trajectories, computes per-bucket OLS estimates, and aggregates them via a Frobenius geometric median to achieve high-probability accuracy, preserving logarithmic dependence on the failure probability . The scalar and vector results show near-sub-Gaussian error rates, with the scalar bound matching Gaussian rates and the vector bound incurring an multiplicative factor and a dependence on and ; the framework also handles adversarial corruptions with a controlled increase in sample complexity. Overall, the work advances robust statistical learning for data-driven control under non-ideal noise, providing foundational PAC guarantees and practical robustness tools for system identification.

Abstract

We consider the problem of estimating the state transition matrix of a linear time-invariant (LTI) system, given access to multiple independent trajectories sampled from the system. Several recent papers have conducted a non-asymptotic analysis of this problem, relying crucially on the assumption that the process noise is either Gaussian or sub-Gaussian, i.e., "light-tailed". In sharp contrast, we work under a significantly weaker noise model, assuming nothing more than the existence of the fourth moment of the noise distribution. For this setting, we provide the first set of results demonstrating that one can obtain sample-complexity bounds for linear system identification that are nearly of the same order as under sub-Gaussian noise. To achieve such results, we develop a novel robust system identification algorithm that relies on constructing multiple weakly-concentrated estimators, and then boosting their performance using suitable tools from high-dimensional robust statistics. Interestingly, our analysis reveals how the kurtosis of the noise distribution, a measure of heavy-tailedness, affects the number of trajectories needed to achieve desired estimation error bounds. Finally, we show that our algorithm and analysis technique can be easily extended to account for scenarios where an adversary can arbitrarily corrupt a small fraction of the collected trajectory data. Our work takes the first steps towards building a robust statistical learning theory for control under non-ideal assumptions on the data-generating process.
Paper Structure (12 sections, 19 theorems, 74 equations)

This paper contains 12 sections, 19 theorems, 74 equations.

Key Result

Theorem 1

Consider the scalar version of the system in eqn:sys_model and the noise assumptions in eqn:noise_model. With probability at least $1-\delta$, the following bound holds for the output $\hat{a}$ of Robust-SysID:

Theorems & Definitions (29)

  • Theorem 1
  • Lemma 1
  • Lemma 2
  • proof
  • Theorem 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Theorem 3
  • Lemma 6
  • ...and 19 more