GenKubeSec: LLM-Based Kubernetes Misconfiguration Detection, Localization, Reasoning, and Remediation
Ehud Malul, Yair Meidan, Dudu Mimran, Yuval Elovici, Asaf Shabtai
TL;DR
GenKubeSec delivers an end-to-end, LLM-based framework for Kubernetes KCF misconfig detection, localization, reasoning, and remediation, addressing limitations of static RB rules and API-based LLMs. It introduces a unified misconfig index (UMI), a fine-tuned KCF-dedicated detector (GenKubeDetect), and a localization/remediation module (GenKubeResolve) built on a local Mistral LLM with few-shot prompts. The approach achieves precision near industry RB tools and superior recall, detects a broad set of misconfigs unseen by RB rules, and provides explanation and remediation at the exact misconfig location. It also demonstrates fast adaptation to new misconfigs with minimal additional data and emphasizes security by avoiding external APIs. The work provides open-source tooling, a large labeled KCF corpus, and a standardized index to foster reproducibility and benchmarking in KCF security research.
Abstract
A key challenge associated with Kubernetes configuration files (KCFs) is that they are often highly complex and error-prone, leading to security vulnerabilities and operational setbacks. Rule-based (RB) tools for KCF misconfiguration detection rely on static rule sets, making them inherently limited and unable to detect newly-discovered misconfigurations. RB tools also suffer from misdetection, since mistakes are likely when coding the detection rules. Recent methods for detecting and remediating KCF misconfigurations are limited in terms of their scalability and detection coverage, or due to the fact that they have high expertise requirements and do not offer automated remediation along with misconfiguration detection. Novel approaches that employ LLMs in their pipeline rely on API-based, general-purpose, and mainly commercial models. Thus, they pose security challenges, have inconsistent classification performance, and can be costly. In this paper, we propose GenKubeSec, a comprehensive and adaptive, LLM-based method, which, in addition to detecting a wide variety of KCF misconfigurations, also identifies the exact location of the misconfigurations and provides detailed reasoning about them, along with suggested remediation. When empirically compared with three industry-standard RB tools, GenKubeSec achieved equivalent precision (0.990) and superior recall (0.999). When a random sample of KCFs was examined by a Kubernetes security expert, GenKubeSec's explanations as to misconfiguration localization, reasoning and remediation were 100% correct, informative and useful. To facilitate further advancements in this domain, we share the unique dataset we collected, a unified misconfiguration index we developed for label standardization, our experimentation code, and GenKubeSec itself as an open-source tool.
