Table of Contents
Fetching ...

Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation

Youngjoon Lee, Taehyun Park, Yunho Lee, Jinu Gong, Joonhyuk Kang

TL;DR

The paper addresses the security risks posed by prompt injection in federated military LLMs, where data sovereignty and alliance collaboration are essential. It proposes a human–AI collaborative framework that combines technical red/blue team wargaming with policy-driven governance to detect, mitigate, and verify defenses against four attack modalities. The four key contributions are: (i) characterization of secret data leakage, free-rider, system disruption, and misinformation threats; (ii) a dual technical-policy defense framework; (iii) emphasis on standardized security frameworks and joint threat response; and (iv) discussion of future directions including cryptographic protections and verification. The work aims to preserve operational security, trust among allies, and the effectiveness of federated military LLMs in contested environments.

Abstract

Federated Learning (FL) is increasingly being adopted in military collaborations to develop Large Language Models (LLMs) while preserving data sovereignty. However, prompt injection attacks-malicious manipulations of input prompts-pose new threats that may undermine operational security, disrupt decision-making, and erode trust among allies. This perspective paper highlights four vulnerabilities in federated military LLMs: secret data leakage, free-rider exploitation, system disruption, and misinformation spread. To address these risks, we propose a human-AI collaborative framework with both technical and policy countermeasures. On the technical side, our framework uses red/blue team wargaming and quality assurance to detect and mitigate adversarial behaviors of shared LLM weights. On the policy side, it promotes joint AI-human policy development and verification of security protocols.

Exploring Potential Prompt Injection Attacks in Federated Military LLMs and Their Mitigation

TL;DR

The paper addresses the security risks posed by prompt injection in federated military LLMs, where data sovereignty and alliance collaboration are essential. It proposes a human–AI collaborative framework that combines technical red/blue team wargaming with policy-driven governance to detect, mitigate, and verify defenses against four attack modalities. The four key contributions are: (i) characterization of secret data leakage, free-rider, system disruption, and misinformation threats; (ii) a dual technical-policy defense framework; (iii) emphasis on standardized security frameworks and joint threat response; and (iv) discussion of future directions including cryptographic protections and verification. The work aims to preserve operational security, trust among allies, and the effectiveness of federated military LLMs in contested environments.

Abstract

Federated Learning (FL) is increasingly being adopted in military collaborations to develop Large Language Models (LLMs) while preserving data sovereignty. However, prompt injection attacks-malicious manipulations of input prompts-pose new threats that may undermine operational security, disrupt decision-making, and erode trust among allies. This perspective paper highlights four vulnerabilities in federated military LLMs: secret data leakage, free-rider exploitation, system disruption, and misinformation spread. To address these risks, we propose a human-AI collaborative framework with both technical and policy countermeasures. On the technical side, our framework uses red/blue team wargaming and quality assurance to detect and mitigate adversarial behaviors of shared LLM weights. On the policy side, it promotes joint AI-human policy development and verification of security protocols.

Paper Structure

This paper contains 14 sections, 4 figures.

Figures (4)

  • Figure 1: FL framework for military LLM training across allied nations. The process involves four key stages: (1) initial LLM synchronization, (2) local training with private data, (3) weight exchange, and (4) model aggregation. This iterative process continues until convergence, while mitigating adversarial risks. Blue clouds represent benign nation's servers while red clouds indicate potentially compromised servers.
  • Figure 2: Illustration of four key FL advantages: privacy preservation, collaborative learning, scalability, and regulatory compliance. The interconnected circular design emphasizes the synergistic relationship among these key factors in FL.
  • Figure 3: Illustration of four potential attack scenarios in military FL environments: (a) Secret data extraction attack, where adversaries systematically probe shared LLMs to extract classified information through targeted prompts and expert verification, (b) Free-rider exploitation attack leveraging strategic prompts to gain military intelligence while withholding authentic data contribution, (c) System disruption attack manipulating model behavior through carefully crafted prompts to create tactical blindspots, and (d) Misinformation spread attack utilizing dual-channel propagation to systematically inject false information into the federation. Each scenario demonstrates sophisticated attack methodologies that exploit vulnerabilities in federated military LLM deployments while maintaining apparent legitimate participation.
  • Figure 4: Proposed human-AI collaborative countermeasure frameworks for protecting federated military LLMs: (a) Technical framework implementing red/blue team wargaming methodology, where specialized LLMs conduct adversarial testing under domain expert supervision, followed by comprehensive quality assurance and error correction processes, (b) Policy framework utilizing iterative policy development through AI-driven design and risk modeling, with multi-stage expert verification and confirmation protocols to ensure robust security measures. Both frameworks emphasize continuous collaboration between human expertise and AI capabilities to maintain operational security while preserving system effectiveness.