Table of Contents
Fetching ...

Verifying the Generalization of Deep Learning to Out-of-Distribution Domains

Guy Amir, Osher Maayan, Tom Zelazny, Guy Katz, Michael Schapira

TL;DR

This research introduces a novel approach for harnessing DNN verification technology to identify DNN-driven decision rules that exhibit robust generalization to previously unencountered input domains, offering the prospect of mitigating the challenges linked to deploying DNN-driven systems in real-world scenarios.

Abstract

Deep neural networks (DNNs) play a crucial role in the field of machine learning, demonstrating state-of-the-art performance across various application domains. However, despite their success, DNN-based models may occasionally exhibit challenges with generalization, i.e., may fail to handle inputs that were not encountered during training. This limitation is a significant challenge when it comes to deploying deep learning for safety-critical tasks, as well as in real-world settings characterized by substantial variability. We introduce a novel approach for harnessing DNN verification technology to identify DNN-driven decision rules that exhibit robust generalization to previously unencountered input domains. Our method assesses generalization within an input domain by measuring the level of agreement between independently trained deep neural networks for inputs in this domain. We also efficiently realize our approach by using off-the-shelf DNN verification engines, and extensively evaluate it on both supervised and unsupervised DNN benchmarks, including a deep reinforcement learning (DRL) system for Internet congestion control -- demonstrating the applicability of our approach for real-world settings. Moreover, our research introduces a fresh objective for formal verification, offering the prospect of mitigating the challenges linked to deploying DNN-driven systems in real-world scenarios.

Verifying the Generalization of Deep Learning to Out-of-Distribution Domains

TL;DR

This research introduces a novel approach for harnessing DNN verification technology to identify DNN-driven decision rules that exhibit robust generalization to previously unencountered input domains, offering the prospect of mitigating the challenges linked to deploying DNN-driven systems in real-world scenarios.

Abstract

Deep neural networks (DNNs) play a crucial role in the field of machine learning, demonstrating state-of-the-art performance across various application domains. However, despite their success, DNN-based models may occasionally exhibit challenges with generalization, i.e., may fail to handle inputs that were not encountered during training. This limitation is a significant challenge when it comes to deploying deep learning for safety-critical tasks, as well as in real-world settings characterized by substantial variability. We introduce a novel approach for harnessing DNN verification technology to identify DNN-driven decision rules that exhibit robust generalization to previously unencountered input domains. Our method assesses generalization within an input domain by measuring the level of agreement between independently trained deep neural networks for inputs in this domain. We also efficiently realize our approach by using off-the-shelf DNN verification engines, and extensively evaluate it on both supervised and unsupervised DNN benchmarks, including a deep reinforcement learning (DRL) system for Internet congestion control -- demonstrating the applicability of our approach for real-world settings. Moreover, our research introduces a fresh objective for formal verification, offering the prospect of mitigating the challenges linked to deploying DNN-driven systems in real-world scenarios.
Paper Structure (66 sections, 7 equations, 47 figures, 8 tables, 5 algorithms)

This paper contains 66 sections, 7 equations, 47 figures, 8 tables, 5 algorithms.

Figures (47)

  • Figure 1: A toy DNN.
  • Figure 2: To calculate the PDT scores, we generated a new DNN that is the concatenation of each pair of DNNs (sharing the same input).
  • Figure 3: Cartpole: in-distribution setting (blue) and OOD setting (red).
  • Figure 4: Cartpole: models' average rewards in different distributions.
  • Figure 5: Cartpole: Alg. \ref{['alg:modelSelection']}'s results, per iteration: the bars represent the ratio of good and bad models in the surviving set (left y-axis), while the curve indicates the number of surviving models (right y-axis). Our technique selected models {6,7,9}.
  • ...and 42 more figures