From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models
Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Linlin Wang
TL;DR
This work tackles IP protection for large language models deployed in web applications by proposing an end-to-end fingerprinting pipeline. It introduces MNLF to create natural, multilingual triggers, RFEdit to inject fingerprints via targeted knowledge editing, and FSFT to preserve fingerprints during fine-tuning by constraining updates to a fingerprint subspace. The approach achieves 100% fingerprint success, robust defense against adversarial manipulation, and improved persistence under pruning, quantization, and fine-tuning, with notable gains when combined (e.g., >10% improvements on math and Alpaca downstream tasks). The work demonstrates a practical path to traceability and IP protection for LLMs in real-world deployment.
Abstract
Fingerprinting is critical for maintaining traceability and protecting the intellectual property (IP) of developers, as LLMs deployed in web applications are susceptible to unauthorized redistribution and misuse via fine-tuning or black-box deployment. However, current backdoor-based fingerprinting methods face a fundamental trade-off: fingerprints embedded as garbled text are easily detected and filtered, whereas those crafted as coherent natural language are prone to being triggered unintentionally. To overcome these limitations, we propose RFEdit, a knowledge-editing framework that embeds a rule-based multilingual natural language fingerprint (MNLF) by modifying a sparse subset of model weights. This approach enables efficient and robust fingerprint injection with minimal impact on unrelated knowledge in LLMs. Our RFEdit framework is further safeguarded by Fingerprint Subspace-aware Fine-Tuning (FSFT), which mitigates fingerprint degradation during legitimate fine-tuning by restricting parameter updates to the fingerprint subspace. This approach preserves fingerprint integrity while enhancing downstream task performance of LLMs. These advances establish a comprehensive pipeline from fingerprint injection to defense, achieving high detection effectiveness, robustness against adversarial manipulations, harmlessness to model utility, and persistence under fine-tuning. Extensive experiments demonstrate that RFEdit maintains robustness under quantization and pruning. Additionally, fingerprint effectiveness is generally improved by more than 10\% when combined with FSFT for math and alpaca downstream tasks.
