LLMmap: Fingerprinting For Large Language Models
Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese
TL;DR
<3-5 sentence high-level summary> LLMmap introduces an active fingerprinting framework to identify the exact LLM version powering an application by issuing carefully crafted prompts and learning from the responses. It combines a robust query strategy with a lightweight, contrastive/open-set inference model to achieve over 95% accuracy across 42 models with as few as eight interactions, and it remains effective across diverse deployment conditions and prompt configurations. The paper also analyzes defenses, showing that masking fingerprint signals is difficult and often degrades functionality, and discusses extensions to detect unseen models and potential future capabilities. Overall, LLMmap provides a practical, scalable tool for security evaluators to profile LLM deployments as part of red-teaming and risk assessment.
Abstract
We introduce LLMmap, a first-generation fingerprinting technique targeted at LLM-integrated applications. LLMmap employs an active fingerprinting approach, sending carefully crafted queries to the application and analyzing the responses to identify the specific LLM version in use. Our query selection is informed by domain expertise on how LLMs generate uniquely identifiable responses to thematically varied prompts. With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM versions--whether open-source or proprietary--from various vendors, operating under various unknown system prompts, stochastic sampling hyperparameters, and even complex generation frameworks such as RAG or Chain-of-Thought. We discuss potential mitigations and demonstrate that, against resourceful adversaries, effective countermeasures may be challenging or even unrealizable.
