Table of Contents
Fetching ...

Exact Recovery for System Identification with More Corrupt Data than Clean Data

Baturalp Yalcin, Haixiang Zhang, Javad Lavaei, Murat Arcak

TL;DR

This work develops robust, non-asymptotic guarantees for exact recovery of discrete-time linear systems under adversarial disturbances, using two convex, non-smooth estimators that penalize disturbance sequences. It addresses both autonomous and input-driven dynamics, deriving sample-complexity bounds under deterministic $\Delta$-spaced attacks and under stochastic attacks with attack probability $p$, including cases where data are correlated and more data is corrupted than clean. The analysis hinges on KKT conditions, Farkas’ lemma, and covering arguments to handle correlated data and non-i.i.d. attack structures, culminating in almost-sure convergence in the asymptotic regime. The results are validated by a biomedical insulin-glucose model demonstration, illustrating exact recovery even when a majority of observations are compromised, and show how incorporating an input sequence can improve identifiability under attack.

Abstract

This paper investigates the system identification problem for linear discrete-time systems under adversaries and analyzes two lasso-type estimators. We examine both asymptotic and non-asymptotic properties of these estimators in two separate scenarios, corresponding to deterministic and stochastic models for the attack times. Since the samples collected from the system are correlated, the existing results on lasso are not applicable. We prove that when the system is stable and attacks are injected periodically, the sample complexity for exact recovery of the system dynamics is linear in terms of the dimension of the states. When adversarial attacks occur at each time instance with probability p, the required sample complexity for exact recovery scales polynomially in the dimension of the states and the probability p. This result implies almost sure convergence to the true system dynamics under the asymptotic regime. As a by-product, our estimators still learn the system correctly even when more than half of the data is compromised. We highlight that the attack vectors are allowed to be correlated with each other in this work, whereas we make some assumptions about the times at which the attacks happen. This paper provides the first mathematical guarantee in the literature on learning from correlated data for dynamical systems in the case when there is less clean data than corrupt data.

Exact Recovery for System Identification with More Corrupt Data than Clean Data

TL;DR

This work develops robust, non-asymptotic guarantees for exact recovery of discrete-time linear systems under adversarial disturbances, using two convex, non-smooth estimators that penalize disturbance sequences. It addresses both autonomous and input-driven dynamics, deriving sample-complexity bounds under deterministic -spaced attacks and under stochastic attacks with attack probability , including cases where data are correlated and more data is corrupted than clean. The analysis hinges on KKT conditions, Farkas’ lemma, and covering arguments to handle correlated data and non-i.i.d. attack structures, culminating in almost-sure convergence in the asymptotic regime. The results are validated by a biomedical insulin-glucose model demonstration, illustrating exact recovery even when a majority of observations are compromised, and show how incorporating an input sequence can improve identifiability under attack.

Abstract

This paper investigates the system identification problem for linear discrete-time systems under adversaries and analyzes two lasso-type estimators. We examine both asymptotic and non-asymptotic properties of these estimators in two separate scenarios, corresponding to deterministic and stochastic models for the attack times. Since the samples collected from the system are correlated, the existing results on lasso are not applicable. We prove that when the system is stable and attacks are injected periodically, the sample complexity for exact recovery of the system dynamics is linear in terms of the dimension of the states. When adversarial attacks occur at each time instance with probability p, the required sample complexity for exact recovery scales polynomially in the dimension of the states and the probability p. This result implies almost sure convergence to the true system dynamics under the asymptotic regime. As a by-product, our estimators still learn the system correctly even when more than half of the data is compromised. We highlight that the attack vectors are allowed to be correlated with each other in this work, whereas we make some assumptions about the times at which the attacks happen. This paper provides the first mathematical guarantee in the literature on learning from correlated data for dynamical systems in the case when there is less clean data than corrupt data.
Paper Structure (18 sections, 15 theorems, 205 equations, 3 figures)

This paper contains 18 sections, 15 theorems, 205 equations, 3 figures.

Key Result

Lemma 1

(Hoeffding's Bound wainwright_2019) Suppose that the variable $X$ has mean $\mu$ and sub-Gaussian parameter $\sigma$. Then, for all $t > 0$, we have

Figures (3)

  • Figure 1: Upper-Bound Value $C_{n,k}$ for Different Values of $n$ and $k$.
  • Figure 2: Estimation errors for Least-Squares, \ref{['eq:lasso-2']}, and \ref{['eq:lasso-1']} with attack probability of $p=0.2, 0.4, 0.6$ (left-to-right).
  • Figure 3: Estimation errors for Least-Squares, \ref{['eq:lasso-2']}, and \ref{['eq:lasso-1']} with attack probability $p=0.6$ not Sparse $d$ (top) Sparse $d$ (bottom).

Theorems & Definitions (21)

  • Definition 1: Sub-Gaussian Random Variable wainwright_2019
  • Lemma 1
  • Definition 2: Subdifferential of $\ell_2$ Norm
  • Definition 3: Subdifferential of $\ell_1$ Norm
  • Theorem 1
  • Definition 4: $\Delta$-spaced Attack Structure
  • Proposition 1
  • Lemma 2
  • Proposition 2
  • Proposition 3
  • ...and 11 more