Table of Contents
Fetching ...

Detecting Vulnerabilities from Issue Reports for Internet-of-Things

Sogol Masoumzadeh

TL;DR

The paper tackles the challenge of detecting vulnerability-indicating issues in IoT from issue reports, a task complicated by IoT's heterogeneity and slower analysis. It presents an empirical study combining ML, NLP, and GPT-4o across 21 Eclipse IoT projects, plus a fine-tuning approach for a BERT Masked Language Model on 11,000 GitHub issues. Key findings show that an SVM using BERT-based NLP features achieves $AUC$ of $0.65$, while GPT-4o reaches $0.60$, whereas the fine-tuned MLM attains only $0.26$ accuracy, underscoring the importance of exposing full data context during training. The work demonstrates the feasibility of IoT vulnerability detection from issue reports and lays groundwork for more robust IoT-focused vulnerability classification similar to non-IoT systems.

Abstract

Timely identification of issue reports reflecting software vulnerabilities is crucial, particularly for Internet-of-Things (IoT) where analysis is slower than non-IoT systems. While Machine Learning (ML) and Large Language Models (LLMs) detect vulnerability-indicating issues in non-IoT systems, their IoT use remains unexplored. We are the first to tackle this problem by proposing two approaches: (1) combining ML and LLMs with Natural Language Processing (NLP) techniques to detect vulnerability-indicating issues of 21 Eclipse IoT projects and (2) fine-tuning a pre-trained BERT Masked Language Model (MLM) on 11,000 GitHub issues for classifying \vul. Our best performance belongs to a Support Vector Machine (SVM) trained on BERT NLP features, achieving an Area Under the receiver operator characteristic Curve (AUC) of 0.65. The fine-tuned BERT achieves 0.26 accuracy, emphasizing the importance of exposing all data during training. Our contributions set the stage for accurately detecting IoT vulnerabilities from issue reports, similar to non-IoT systems.

Detecting Vulnerabilities from Issue Reports for Internet-of-Things

TL;DR

The paper tackles the challenge of detecting vulnerability-indicating issues in IoT from issue reports, a task complicated by IoT's heterogeneity and slower analysis. It presents an empirical study combining ML, NLP, and GPT-4o across 21 Eclipse IoT projects, plus a fine-tuning approach for a BERT Masked Language Model on 11,000 GitHub issues. Key findings show that an SVM using BERT-based NLP features achieves of , while GPT-4o reaches , whereas the fine-tuned MLM attains only accuracy, underscoring the importance of exposing full data context during training. The work demonstrates the feasibility of IoT vulnerability detection from issue reports and lays groundwork for more robust IoT-focused vulnerability classification similar to non-IoT systems.

Abstract

Timely identification of issue reports reflecting software vulnerabilities is crucial, particularly for Internet-of-Things (IoT) where analysis is slower than non-IoT systems. While Machine Learning (ML) and Large Language Models (LLMs) detect vulnerability-indicating issues in non-IoT systems, their IoT use remains unexplored. We are the first to tackle this problem by proposing two approaches: (1) combining ML and LLMs with Natural Language Processing (NLP) techniques to detect vulnerability-indicating issues of 21 Eclipse IoT projects and (2) fine-tuning a pre-trained BERT Masked Language Model (MLM) on 11,000 GitHub issues for classifying \vul. Our best performance belongs to a Support Vector Machine (SVM) trained on BERT NLP features, achieving an Area Under the receiver operator characteristic Curve (AUC) of 0.65. The fine-tuned BERT achieves 0.26 accuracy, emphasizing the importance of exposing all data during training. Our contributions set the stage for accurately detecting IoT vulnerabilities from issue reports, similar to non-IoT systems.

Paper Structure

This paper contains 5 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: The overview of the study
  • Figure 2: The saturation of the classification performance