Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation
Youngjoon Lee, Taehyun Park, Yunho Lee, Jinu Gong, Joonhyuk Kang
TL;DR
The paper addresses the security risks posed by prompt injection in federated military LLMs, where data sovereignty and alliance collaboration are essential. It proposes a human–AI collaborative framework that combines technical red/blue team wargaming with policy-driven governance to detect, mitigate, and verify defenses against four attack modalities. The four key contributions are: (i) characterization of secret data leakage, free-rider, system disruption, and misinformation threats; (ii) a dual technical-policy defense framework; (iii) emphasis on standardized security frameworks and joint threat response; and (iv) discussion of future directions including cryptographic protections and verification. The work aims to preserve operational security, trust among allies, and the effectiveness of federated military LLMs in contested environments.
Abstract
Federated Learning (FL) is increasingly being adopted in military collaborations to develop Large Language Models (LLMs) while preserving data sovereignty. However, prompt injection attacks-malicious manipulations of input prompts-pose new threats that may undermine operational security, disrupt decision-making, and erode trust among allies. This perspective paper highlights four vulnerabilities in federated military LLMs: secret data leakage, free-rider exploitation, system disruption, and misinformation spread. To address these risks, we propose a human-AI collaborative framework with both technical and policy countermeasures. On the technical side, our framework uses red/blue team wargaming and quality assurance to detect and mitigate adversarial behaviors of shared LLM weights. On the policy side, it promotes joint AI-human policy development and verification of security protocols.
