Uncertainty Quantification of Data-Driven Output Predictors in the Output Error Setting

Farzan Kaviani; Ivan Markovsky; Hamid R. Ossareh

Uncertainty Quantification of Data-Driven Output Predictors in the Output Error Setting

Farzan Kaviani, Ivan Markovsky, Hamid R. Ossareh

TL;DR

This work tackles uncertainty quantification for data-driven output predictors in the OE setting, deriving two upper bounds on prediction error when offline data are contaminated by bounded noise. The first bound treats predictions from raw data, while the second addresses predictions after TSVD-based de-noising, with both bounds computable from noisy data and a known noise bound without ground-truth outputs. Numerical results show the bounds decrease approximately linearly with the noise level and highlight that TSVD de-noising does not universally improve OE prediction accuracy, though it can broaden the applicable regime via a more general bound. The findings support robustification of data-driven control methods like DeePC under data inaccuracy and inform Hankel-partition choices, with future work exploring GLRA approaches and extensions to errors-in-variables scenarios.

Abstract

We revisit the problem of predicting the output of an LTI system directly using offline input-output data (and without the use of a parametric model) in the behavioral setting. Existing works calculate the output predictions by projecting the recent samples of the input and output signals onto the column span of a Hankel matrix consisting of the offline input-output data. However, if the offline data is corrupted by noise, the output prediction is no longer exact. While some prior works propose mitigating noisy data through matrix low-ranking approximation heuristics, such as truncated singular value decomposition, the ensuing prediction accuracy remains unquantified. This paper fills these gaps by introducing two upper bounds on the prediction error under the condition that the noise is sufficiently small relative to the offline data's magnitude. The first bound pertains to prediction using the raw offline data directly, while the second one applies to the case of low-ranking approximation heuristic. Notably, the bounds do not require the ground truth about the system output, relying solely on noisy measurements with a known noise level and system order. Extensive numerical simulations show that both bounds decrease monotonically (and linearly) as a function of the noise level. Furthermore, our results demonstrate that applying the de-noising heuristic in the output error setup does not generally lead to a better prediction accuracy as compared to using raw data directly, nor a smaller upper bound on the prediction error. However, it allows for a more general upper bound, as the first upper bound requires a specific condition on the partitioning of the Hankel matrix.

Uncertainty Quantification of Data-Driven Output Predictors in the Output Error Setting

TL;DR

Abstract

Paper Structure (13 sections, 4 theorems, 62 equations, 4 figures)

This paper contains 13 sections, 4 theorems, 62 equations, 4 figures.

Introduction
Problem Formulation
Main Results
Preliminaries
Upper bound on $\|\mathbf{\tilde{y}_{pred}}-\mathbf{y_{pred}}\|_2$
Upper bound on $\|\mathbf{\hat{y}_{pred}}-\mathbf{y_{pred}}\|_2$
Analysis of the bounds
Conclusions and Future Work
Appendix
Proof of Lemma \ref{['lemm:Lemma_1']}
Proof of Lemma \ref{['lemm:H_1 rank']}
Proof of Theorem \ref{['thm:no tsvd']}
Proof of Theorem \ref{['thm:tsvd']}

Key Result

Lemma 1

The perturbation matrices satisfy:

Figures (4)

Figure 1: Comparison of normalized prediction errors using raw offline data and low-rank approximation of offline data
Figure 2: Comparison of the relative gaps obtained from Theorem 1 and Theorem 2.
Figure 3: Average upper bound values at each noise level plotted against the noise level $N$.
Figure 4: Box plots illustrating the median, 25th, and 75th percentiles (box edges) of relative gap values for both theorems under the influence of the $\delta_{\text{SN}}>0.6$ condition. Whiskers extend to the most extreme non-outlier data points, and outliers are plotted individually with '+' symbols. The top and bottom subplots depict the relative gap values for the first and second upper bounds, respectively.

Theorems & Definitions (15)

Remark 1
Remark 2
Lemma 1
proof
Lemma 2
proof
Theorem 1
proof
Remark 3
Theorem 2
...and 5 more

Uncertainty Quantification of Data-Driven Output Predictors in the Output Error Setting

TL;DR

Abstract

Uncertainty Quantification of Data-Driven Output Predictors in the Output Error Setting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (15)