Origin Tracer: A Method for Detecting LoRA Fine-Tuning Origins in LLMs
Hongyu Liang, Yuting Zheng, Yihan Li, Yiran Zhang, Shiyu Liang
TL;DR
This work tackles the problem of verifying whether a fine-tuned model originates from a specified base model, particularly under obfuscation strategies that obscure parameter mappings. It introduces Origin-Tracer, a framework that (i) extracts the LoRA rank information from intermediate states via singular-value analysis across transformer layers and (ii) reconstructs obfuscated intermediate representations through an iterative gradient-based procedure, enabling robust provenance verification even when permutations or scaling have been applied. The approach is formalized through a problem statement that models looser-than-close relationships between base and derivative models, and it is validated on 31 open-source models fine-tuned with LoRA, demonstrating resilience to obfuscation and providing a path toward new benchmarks for model-verification. The findings indicate that middle transformer layers are especially informative for rank recovery and that the Origin-Tracer framework can establish practical, obstruction-tolerant provenance checks with potential impact on transparency and trust in open-source AI deployments, while also outlining specific limitations and avenues for broader applicability.
Abstract
As large language models (LLMs) continue to advance, their deployment often involves fine-tuning to enhance performance on specific downstream tasks. However, this customization is sometimes accompanied by misleading claims about the origins, raising significant concerns about transparency and trust within the open-source community. Existing model verification techniques typically assess functional, representational, and weight similarities. However, these approaches often struggle against obfuscation techniques, such as permutations and scaling transformations. To address this limitation, we propose a novel detection method Origin-Tracer that rigorously determines whether a model has been fine-tuned from a specified base model. This method includes the ability to extract the LoRA rank utilized during the fine-tuning process, providing a more robust verification framework. This framework is the first to provide a formalized approach specifically aimed at pinpointing the sources of model fine-tuning. We empirically validated our method on thirty-one diverse open-source models under conditions that simulate real-world obfuscation scenarios. We empirically analyze the effectiveness of our framework and finally, discuss its limitations. The results demonstrate the effectiveness of our approach and indicate its potential to establish new benchmarks for model verification.
