Table of Contents
Fetching ...

CVE-LLM : Automatic vulnerability evaluation in medical device industry using large language models

Rikhiya Ghosh, Oladimeji Farri, Hans-Martin von Stockhausen, Martin Schmitt, George Marica Vasile

TL;DR

CVE-LLM demonstrates that large language models trained on historical vulnerability evaluations can automate vulnerability assessments for medical devices, focusing on asset context and third-party components. By combining domain-adaptive pretraining with instruction tuning and a human-in-the-loop framework, the approach yields structured outputs (VEXCategory, VEXJustification, Vector, and comments) with high fidelity and substantial speedups over human analysts. The work shows CVE-LLM outperforms open-source LLM baselines, with robust ablation results linking performance to dataset diversity, domain adaptation, and careful inference settings, and it demonstrates practical deployment benefits including real-time inference gains. This implies a tangible path toward scalable, rapid vulnerability management in medical devices, addressing FDA guidance requirements and long device lifetimes while enabling safer healthcare delivery.

Abstract

The healthcare industry is currently experiencing an unprecedented wave of cybersecurity attacks, impacting millions of individuals. With the discovery of thousands of vulnerabilities each month, there is a pressing need to drive the automation of vulnerability assessment processes for medical devices, facilitating rapid mitigation efforts. Generative AI systems have revolutionized various industries, offering unparalleled opportunities for automation and increased efficiency. This paper presents a solution leveraging Large Language Models (LLMs) to learn from historical evaluations of vulnerabilities for the automatic assessment of vulnerabilities in the medical devices industry. This approach is applied within the portfolio of a single manufacturer, taking into account device characteristics, including existing security posture and controls. The primary contributions of this paper are threefold. Firstly, it provides a detailed examination of the best practices for training a vulnerability Language Model (LM) in an industrial context. Secondly, it presents a comprehensive comparison and insightful analysis of the effectiveness of Language Models in vulnerability assessment. Finally, it proposes a new human-in-the-loop framework to expedite vulnerability evaluation processes.

CVE-LLM : Automatic vulnerability evaluation in medical device industry using large language models

TL;DR

CVE-LLM demonstrates that large language models trained on historical vulnerability evaluations can automate vulnerability assessments for medical devices, focusing on asset context and third-party components. By combining domain-adaptive pretraining with instruction tuning and a human-in-the-loop framework, the approach yields structured outputs (VEXCategory, VEXJustification, Vector, and comments) with high fidelity and substantial speedups over human analysts. The work shows CVE-LLM outperforms open-source LLM baselines, with robust ablation results linking performance to dataset diversity, domain adaptation, and careful inference settings, and it demonstrates practical deployment benefits including real-time inference gains. This implies a tangible path toward scalable, rapid vulnerability management in medical devices, addressing FDA guidance requirements and long device lifetimes while enabling safer healthcare delivery.

Abstract

The healthcare industry is currently experiencing an unprecedented wave of cybersecurity attacks, impacting millions of individuals. With the discovery of thousands of vulnerabilities each month, there is a pressing need to drive the automation of vulnerability assessment processes for medical devices, facilitating rapid mitigation efforts. Generative AI systems have revolutionized various industries, offering unparalleled opportunities for automation and increased efficiency. This paper presents a solution leveraging Large Language Models (LLMs) to learn from historical evaluations of vulnerabilities for the automatic assessment of vulnerabilities in the medical devices industry. This approach is applied within the portfolio of a single manufacturer, taking into account device characteristics, including existing security posture and controls. The primary contributions of this paper are threefold. Firstly, it provides a detailed examination of the best practices for training a vulnerability Language Model (LM) in an industrial context. Secondly, it presents a comprehensive comparison and insightful analysis of the effectiveness of Language Models in vulnerability assessment. Finally, it proposes a new human-in-the-loop framework to expedite vulnerability evaluation processes.
Paper Structure (36 sections, 2 equations, 5 figures, 9 tables)

This paper contains 36 sections, 2 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Schematic of the Training paradigm
  • Figure 2: Inference and Model evaluation Schematic
  • Figure 3: Ablation studies: Effect of various factors on model outcome
  • Figure 4: Dataset statistics
  • Figure 5: Environmental Vector confusion matrix: breakdown by metrics