On Security Weaknesses and Vulnerabilities in Deep Learning Systems
Zhongzheng Lai, Huaming Chen, Ruoxi Sun, Yu Zhang, Minhui Xue, Dong Yuan
TL;DR
This study addresses the problem of security weaknesses and vulnerabilities in DL-enabled software by conducting a first systematic analysis across five major open-source DL frameworks. It introduces a two-stream data analysis framework that fuses official vulnerability data (CVE/NVD) with developer activity in GitHub to identify and classify DL-related vulnerabilities, culminating in a dataset of 3,049 instances and a DL-specific vulnerability taxonomy. The authors provide extensive methodological details, manual labeling with high inter-rater agreement, and a replication package to enable reproducibility, offering concrete insights into root causes, detection challenges, and patching difficulties. The work highlights that DL systems exhibit unique vulnerability patterns—particularly around resource management and DL-specific tensor operations—and proposes actionable guidance and taxonomy extensions to advance secure DL system development and maintenance. Overall, the paper contributes the first large-scale, framework-spanning vulnerability study for DL, with practical implications for practitioners and a foundation for future security research in AI-enabled software.
Abstract
The security guarantee of AI-enabled software systems (particularly using deep learning techniques as a functional core) is pivotal against the adversarial attacks exploiting software vulnerabilities. However, little attention has been paid to a systematic investigation of vulnerabilities in such systems. A common situation learned from the open source software community is that deep learning engineers frequently integrate off-the-shelf or open-source learning frameworks into their ecosystems. In this work, we specifically look into deep learning (DL) framework and perform the first systematic study of vulnerabilities in DL systems through a comprehensive analysis of identified vulnerabilities from Common Vulnerabilities and Exposures (CVE) and open-source DL tools, including TensorFlow, Caffe, OpenCV, Keras, and PyTorch. We propose a two-stream data analysis framework to explore vulnerability patterns from various databases. We investigate the unique DL frameworks and libraries development ecosystems that appear to be decentralized and fragmented. By revisiting the Common Weakness Enumeration (CWE) List, which provides the traditional software vulnerability related practices, we observed that it is more challenging to detect and fix the vulnerabilities throughout the DL systems lifecycle. Moreover, we conducted a large-scale empirical study of 3,049 DL vulnerabilities to better understand the patterns of vulnerability and the challenges in fixing them. We have released the full replication package at https://github.com/codelzz/Vulnerabilities4DLSystem. We anticipate that our study can advance the development of secure DL systems.
