Generative AI in Cybersecurity: A Comprehensive Review of LLM Applications and Vulnerabilities
Mohamed Amine Ferrag, Fatima Alwahedi, Ammar Battah, Bilel Cherif, Abdechakour Mechri, Norbert Tihanyi, Tamas Bisztray, Merouane Debbah
TL;DR
This comprehensive review addresses the convergence of Generative AI and cybersecurity, outlining a spectrum of LLM-driven applications from hardware design security to phishing and threat intelligence. It presents a detailed taxonomy of models, datasets, vulnerabilities, and defense strategies, highlighting how approaches such as RAG, RLHF, DPO, ORPO, and QLoRA can enhance real-time defense while mitigating attack vectors like prompt injection and data poisoning. Key contributions include a cross-model performance analysis across 42 LLMs, a lifecycle analysis of cybersecurity datasets, and a framework for evaluating and deploying secure LLM systems in diverse cyber contexts. The study underscores the urgent need for robust data governance, secure deployment practices, and continuous adversarial testing to realize reliable, scalable LLM-enabled cybersecurity solutions with practical impact for threat detection and response. Overall, the work lays a foundational direction for integrating LLMs into future cybersecurity architectures, balancing ambitious capabilities with rigorous security considerations to address evolving digital threats.
Abstract
This paper provides a comprehensive review of the future of cybersecurity through Generative AI and Large Language Models (LLMs). We explore LLM applications across various domains, including hardware design security, intrusion detection, software engineering, design verification, cyber threat intelligence, malware detection, and phishing detection. We present an overview of LLM evolution and its current state, focusing on advancements in models such as GPT-4, GPT-3.5, Mixtral-8x7B, BERT, Falcon2, and LLaMA. Our analysis extends to LLM vulnerabilities, such as prompt injection, insecure output handling, data poisoning, DDoS attacks, and adversarial instructions. We delve into mitigation strategies to protect these models, providing a comprehensive look at potential attack scenarios and prevention techniques. Furthermore, we evaluate the performance of 42 LLM models in cybersecurity knowledge and hardware security, highlighting their strengths and weaknesses. We thoroughly evaluate cybersecurity datasets for LLM training and testing, covering the lifecycle from data creation to usage and identifying gaps for future research. In addition, we review new strategies for leveraging LLMs, including techniques like Half-Quadratic Quantization (HQQ), Reinforcement Learning with Human Feedback (RLHF), Direct Preference Optimization (DPO), Quantized Low-Rank Adapters (QLoRA), and Retrieval-Augmented Generation (RAG). These insights aim to enhance real-time cybersecurity defenses and improve the sophistication of LLM applications in threat detection and response. Our paper provides a foundational understanding and strategic direction for integrating LLMs into future cybersecurity frameworks, emphasizing innovation and robust model deployment to safeguard against evolving cyber threats.
