Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

Xuan Wang; Siyuan Liang; Dongping Liao; Han Fang; Aishan Liu; Xiaochun Cao; Yu-liang Lu; Ee-Chien Chang; Xitong Gao

Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

Xuan Wang, Siyuan Liang, Dongping Liao, Han Fang, Aishan Liu, Xiaochun Cao, Yu-liang Lu, Ee-Chien Chang, Xitong Gao

TL;DR

This work tackles backdoor threats in outsourced AI training by introducing Lie Detector, a cross‑examination framework that compares two independently trained models to reveal inconsistencies indicative of backdoors. It combines output‑distribution optimization with Centered Kernel Alignment (CKA) based representational analysis to recover triggers, and adds a fine‑tuning sensitivity check to reduce false positives. The method demonstrates strong, cross‑paradigm performance, outperforming state‑of‑the‑art detectors on SL, SSL, and AL tasks and enabling backdoor detection in multimodal large language models. Practically, this framework enables secure, semi‑honest outsourced training, promoting safer deployment of complex AI systems across diverse modalities.

Abstract

Institutions with limited data and computing resources often outsource model training to third-party providers in a semi-honest setting, assuming adherence to prescribed training protocols with pre-defined learning paradigm (e.g., supervised or semi-supervised learning). However, this practice can introduce severe security risks, as adversaries may poison the training data to embed backdoors into the resulting model. Existing detection approaches predominantly rely on statistical analyses, which often fail to maintain universally accurate detection accuracy across different learning paradigms. To address this challenge, we propose a unified backdoor detection framework in the semi-honest setting that exploits cross-examination of model inconsistencies between two independent service providers. Specifically, we integrate central kernel alignment to enable robust feature similarity measurements across different model architectures and learning paradigms, thereby facilitating precise recovery and identification of backdoor triggers. We further introduce backdoor fine-tuning sensitivity analysis to distinguish backdoor triggers from adversarial perturbations, substantially reducing false positives. Extensive experiments demonstrate that our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines across supervised, semi-supervised, and autoregressive learning tasks, respectively. Notably, it is the first to effectively detect backdoors in multimodal large language models, further highlighting its broad applicability and advancing secure deep learning.

Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

TL;DR

Abstract

Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (1)