Exact Recovery for System Identification with More Corrupt Data than Clean Data

Baturalp Yalcin; Haixiang Zhang; Javad Lavaei; Murat Arcak

Exact Recovery for System Identification with More Corrupt Data than Clean Data

Baturalp Yalcin, Haixiang Zhang, Javad Lavaei, Murat Arcak

TL;DR

This work develops robust, non-asymptotic guarantees for exact recovery of discrete-time linear systems under adversarial disturbances, using two convex, non-smooth estimators that penalize disturbance sequences. It addresses both autonomous and input-driven dynamics, deriving sample-complexity bounds under deterministic $\Delta$-spaced attacks and under stochastic attacks with attack probability $p$, including cases where data are correlated and more data is corrupted than clean. The analysis hinges on KKT conditions, Farkas’ lemma, and covering arguments to handle correlated data and non-i.i.d. attack structures, culminating in almost-sure convergence in the asymptotic regime. The results are validated by a biomedical insulin-glucose model demonstration, illustrating exact recovery even when a majority of observations are compromised, and show how incorporating an input sequence can improve identifiability under attack.

Abstract

This paper investigates the system identification problem for linear discrete-time systems under adversaries and analyzes two lasso-type estimators. We examine both asymptotic and non-asymptotic properties of these estimators in two separate scenarios, corresponding to deterministic and stochastic models for the attack times. Since the samples collected from the system are correlated, the existing results on lasso are not applicable. We prove that when the system is stable and attacks are injected periodically, the sample complexity for exact recovery of the system dynamics is linear in terms of the dimension of the states. When adversarial attacks occur at each time instance with probability p, the required sample complexity for exact recovery scales polynomially in the dimension of the states and the probability p. This result implies almost sure convergence to the true system dynamics under the asymptotic regime. As a by-product, our estimators still learn the system correctly even when more than half of the data is compromised. We highlight that the attack vectors are allowed to be correlated with each other in this work, whereas we make some assumptions about the times at which the attacks happen. This paper provides the first mathematical guarantee in the literature on learning from correlated data for dynamical systems in the case when there is less clean data than corrupt data.

Exact Recovery for System Identification with More Corrupt Data than Clean Data

TL;DR

-spaced attacks and under stochastic attacks with attack probability

, including cases where data are correlated and more data is corrupted than clean. The analysis hinges on KKT conditions, Farkas’ lemma, and covering arguments to handle correlated data and non-i.i.d. attack structures, culminating in almost-sure convergence in the asymptotic regime. The results are validated by a biomedical insulin-glucose model demonstration, illustrating exact recovery even when a majority of observations are compromised, and show how incorporating an input sequence can improve identifiability under attack.

Abstract

Paper Structure (18 sections, 15 theorems, 205 equations, 3 figures)

This paper contains 18 sections, 15 theorems, 205 equations, 3 figures.

Introduction
Notation and Preliminaries
Problem Formulation
Autonomous Systems
Systems with Input Sequence
Numerical Experiment
Discussion and Conclusion
Proofs for Results in Main Part
Proof of Proposition \ref{['prop: delta']}
Proof of Proposition \ref{['prop: general-deter']}
Proof of Proposition \ref{['prop: equalitiy']}
Proof of Lemma \ref{['lem: farkas']}
Proof of Theorem \ref{['thm: exact-l2-general-no-input']}
Step 1-1
Step 1-2
...and 3 more sections

Key Result

Lemma 1

(Hoeffding's Bound wainwright_2019) Suppose that the variable $X$ has mean $\mu$ and sub-Gaussian parameter $\sigma$. Then, for all $t > 0$, we have

Figures (3)

Figure 1: Upper-Bound Value $C_{n,k}$ for Different Values of $n$ and $k$.
Figure 2: Estimation errors for Least-Squares, \ref{['eq:lasso-2']}, and \ref{['eq:lasso-1']} with attack probability of $p=0.2, 0.4, 0.6$ (left-to-right).
Figure 3: Estimation errors for Least-Squares, \ref{['eq:lasso-2']}, and \ref{['eq:lasso-1']} with attack probability $p=0.6$ not Sparse $d$ (top) Sparse $d$ (bottom).

Theorems & Definitions (21)

Definition 1: Sub-Gaussian Random Variable wainwright_2019
Lemma 1
Definition 2: Subdifferential of $\ell_2$ Norm
Definition 3: Subdifferential of $\ell_1$ Norm
Theorem 1
Definition 4: $\Delta$-spaced Attack Structure
Proposition 1
Lemma 2
Proposition 2
Proposition 3
...and 11 more

Exact Recovery for System Identification with More Corrupt Data than Clean Data

TL;DR

Abstract

Exact Recovery for System Identification with More Corrupt Data than Clean Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (21)