A Survey of Large Language Models in Cybersecurity
Gabriel de Jesus Coelho da Silva, Carlos Becker Westphall
TL;DR
This survey analyzes how large language models are applied in cybersecurity, spanning autonomous pentesting, vulnerability repair, phishing detection, and CTF challenge solving. It highlights key limitations including context loss, hallucinations, and the need for reliable grounding, proposing a Mixture-of-Experts framework that delegates tasks to specialized LLMs to improve reliability and scalability. The work synthesizes prominent studies (PentestGPT, zero-shot vulnerability repair, and AI-assisted CTFs) to map current capabilities and gaps, and it outlines concrete future directions such as domain-specific fine-tuning, RAG, CoVe, and governance. The findings inform researchers and practitioners about practical deployment considerations and pave the way for safer, more capable AI-assisted cybersecurity tooling.
Abstract
Large Language Models (LLMs) have quickly risen to prominence due to their ability to perform at or close to the state-of-the-art in a variety of fields while handling natural language. An important field of research is the application of such models at the cybersecurity context. This survey aims to identify where in the field of cybersecurity LLMs have already been applied, the ways in which they are being used and their limitations in the field. Finally, suggestions are made on how to improve such limitations and what can be expected from these systems once these limitations are overcome.
