Keys in the Weights: Transformer Authentication Using Model-Bound Latent Representations
Ayşe S. Okatan, Mustafa İlhan Akbaş, Laxima Niure Kandel, Berker Peköz
TL;DR
The paper addresses secure inter-model interoperability by showing that Transformer autoencoders trained with identical architecture and data but different seeds exhibit Zero-Shot Decoder Non-Transferability (ZSDN): an encoder memory $H^L$ is decodable only by its paired decoder, with cross-decoding by other seeds collapsing to chance. The authors formalize ZSDN and a decoder-binding advantage, provide weight-space and attention-divergence diagnostics, and reinterpret encoder weights as implicit private keys in a cryptographic-style model-binding framework. They demonstrate that self-decoding vastly outperforms cross-decoding (e.g., over 90% exact matches and ~98% token accuracy vs near-chance), while clone or same-seed variants reproduce the original performance, and distinct seeds yield meaningful parameter and attention divergences. The work proposes deployment considerations (integrity, rekeying, access control) and discusses learnability risks and mitigations, offering MoBLE as a lightweight security layer for safe AI pipelines in safety-critical domains. Overall, it highlights a practical, accelerator-friendly mechanism for secure model-to-model communication based on latent representations that function as private keys without injected secrets or adversarial training, with broad implications for provenance, integrity, and interoperability in AI systems.
Abstract
We introduce Model-Bound Latent Exchange (MoBLE), a decoder-binding property in Transformer autoencoders formalized as Zero-Shot Decoder Non-Transferability (ZSDN). In identity tasks using iso-architectural models trained on identical data but differing in seeds, self-decoding achieves more than 0.91 exact match and 0.98 token accuracy, while zero-shot cross-decoding collapses to chance without exact matches. This separation arises without injected secrets or adversarial training, and is corroborated by weight-space distances and attention-divergence diagnostics. We interpret ZSDN as model binding, a latent-based authentication and access-control mechanism, even when the architecture and training recipe are public: encoder's hidden state representation deterministically reveals the plaintext, yet only the correctly keyed decoder reproduces it in zero-shot. We formally define ZSDN, a decoder-binding advantage metric, and outline deployment considerations for secure artificial intelligence (AI) pipelines. Finally, we discuss learnability risks (e.g., adapter alignment) and outline mitigations. MoBLE offers a lightweight, accelerator-friendly approach to secure AI deployment in safety-critical domains, including aviation and cyber-physical systems.
