Invisible Traces: Using Hybrid Fingerprinting to identify underlying LLMs in GenAI Apps
Devansh Bhardwaj, Naman Mishra
TL;DR
This work tackles the challenge of identifying underlying LLMs in GenAI apps under real-world constraints where single-method fingerprinting fails. It introduces a hybrid framework that combines static probing (LLMMap and manual fingerprinting) with dynamic observation (ModernBERT classifier) in a two-phase pipeline, fused via a weight $\alpha$, to produce $P_{final}$. Empirical evaluations across 1000+ apps and 14 LLMs show the hybrid approach significantly outperforms individual methods, achieving around 86% accuracy at $n=10$ and demonstrating strong class-wise separability in embeddings. The results have important implications for AI security, governance, and red-teaming, offering a practical, adaptable method for monitoring and verifying deployed LLMs while acknowledging ethical considerations and potential misuse.
Abstract
Fingerprinting refers to the process of identifying underlying Machine Learning (ML) models of AI Systemts, such as Large Language Models (LLMs), by analyzing their unique characteristics or patterns, much like a human fingerprint. The fingerprinting of Large Language Models (LLMs) has become essential for ensuring the security and transparency of AI-integrated applications. While existing methods primarily rely on access to direct interactions with the application to infer model identity, they often fail in real-world scenarios involving multi-agent systems, frequent model updates, and restricted access to model internals. In this paper, we introduce a novel fingerprinting framework designed to address these challenges by integrating static and dynamic fingerprinting techniques. Our approach identifies architectural features and behavioral traits, enabling accurate and robust fingerprinting of LLMs in dynamic environments. We also highlight new threat scenarios where traditional fingerprinting methods are ineffective, bridging the gap between theoretical techniques and practical application. To validate our framework, we present an extensive evaluation setup that simulates real-world conditions and demonstrate the effectiveness of our methods in identifying and monitoring LLMs in Gen-AI applications. Our results highlight the framework's adaptability to diverse and evolving deployment contexts.
