Using LLMs for Tabletop Exercises within the Security Domain
Sam Hays, Jules White
TL;DR
The paper addresses the high cost and slow cadence of traditional security tabletop exercises and proposes using Large Language Models (LLMs) to streamline scenario generation, moderation, and retrospective analysis. It demonstrates how LLMs can generate and moderate live tabletop scenarios, provide iterative feedback, and support micro-tabletops that focus on specific domains for continuous improvement. Key contributions include explicit preparedness metrics with $P = (S + K + R + C + A + E)/P_{max}$, the preparedness delta $\Delta P = P_1 - P_2$, and the Unified Preparedness and Balance Score $UPBS = \alpha P_{avg} + \beta (1 - |\bar{\Delta P}|)$, plus practical methods for scenario generation and automated recommendations. The results indicate potential reductions in cost and planning time, higher exercise frequency, and more relevant security readiness outcomes through AI-assisted tabletop workflows.
Abstract
Tabletop exercises are a crucial component of many company's strategy to test and evaluate its preparedness for security incidents in a realistic way. Traditionally led by external firms specializing in cybersecurity, these exercises can be costly, time-consuming, and may not always align precisely with the client's specific needs. Large Language Models (LLMs) like ChatGPT offer a compelling alternative. They enable faster iteration, provide rich and adaptable simulations, and offer infinite patience in handling feedback and recommendations. This approach can enhances the efficiency and relevance of security preparedness exercises.
