Ghost in the Transformer: Detecting Model Reuse with Invariant Spectral Signatures
Suqing Wang, Ziyang Ma, Li Xinyi, Zuchao Li
TL;DR
This work tackles the problem of verifying the provenance of large language models amid widespread reuse and fine-tuning. It introduces GhostSpec, a data-free, white-box fingerprinting method that leverages invariant spectral signatures from attention weight products, coupled with POSA to align layers across architectures with different depths. The authors define two robust similarity metrics, GhostSpec-mse and GhostSpec-corr, and demonstrate through extensive experiments that GhostSpec reliably distinguishes derivative models from unrelated ones, even under aggressive modifications such as pruning, merging, and expansion. The approach offers a practical tool for intellectual property protection and improved transparency in open-source LLM ecosystems, with open-source code available for replication.
Abstract
Large Language Models (LLMs) are widely adopted, but their high training cost leads many developers to fine-tune existing open-source models. While most adhere to open-source licenses, some falsely claim original training despite clear derivation from public models, raising pressing concerns about intellectual property protection and the need to verify model provenance. In this paper, we propose GhostSpec, a lightweight yet effective method for verifying LLM lineage without access to training data or modification of model behavior. Our approach constructs compact and robust fingerprints by applying singular value decomposition (SVD) to invariant products of internal attention weight matrices. Unlike watermarking or output-based methods, GhostSpec is fully data-free, non-invasive, and computationally efficient. Extensive experiments show it is robust to fine-tuning, pruning, expansion, and adversarial transformations, reliably tracing lineage with minimal overhead. By offering a practical solution for model verification, our method contributes to intellectual property protection and fosters a transparent, trustworthy LLM ecosystem. Our code is available at https://github.com/DX0369/GhostSpec.
