LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem

Hongyi Liu; Shaochen Zhong; Xintong Sun; Minghao Tian; Mohsen Hariri; Zirui Liu; Ruixiang Tang; Zhimeng Jiang; Jiayi Yuan; Yu-Neng Chuang; Li Li; Soo-Hyun Choi; Rui Chen; Vipin Chaudhary; Xia Hu

LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem

Hongyi Liu, Shaochen Zhong, Xintong Sun, Minghao Tian, Mohsen Hariri, Zirui Liu, Ruixiang Tang, Zhimeng Jiang, Jiayi Yuan, Yu-Neng Chuang, Li Li, Soo-Hyun Choi, Rui Chen, Vipin Chaudhary, Xia Hu

TL;DR

This work identifies a practical security risk in the LoRA share-and-play ecosystem, where attackers can distribute backdoored adapters that appear to improve downstream performance. It proposes LoRATK, a training-free merging-based attack that trains a backdoor-only FF LoRA and seamlessly merges it with existing task LoRAs to preserve both malicious and benign capabilities. The authors introduce merging strategies, notably FF-only Merge and 3-way Complement Merge, along with Diversified Backdoor Completion Reconstruction to enhance merging compatibility, and validate these approaches across multiple models and downstream tasks. The findings reveal a feasible, scalable threat capable of infection at scale with limited defense effectiveness, underscoring the need for security-aware curation and mitigation in LoRA repositories and local deployments.

Abstract

Finetuning LLMs with LoRA has gained significant popularity due to its simplicity and effectiveness. Often, users may even find pluggable, community-shared LoRAs to enhance their base models for a specific downstream task of interest; enjoying a powerful, efficient, yet customized LLM experience with negligible investment. However, this convenient share-and-play ecosystem also introduces a new attack surface, where attackers can distribute malicious LoRAs to a community eager to try out shared assets. Despite the high-risk potential, no prior art has comprehensively explored LoRA's attack surface under the downstream-enhancing share-and-play context. In this paper, we investigate how backdoors can be injected into task-enhancing LoRAs and examine the mechanisms of such infections. We find that with a simple, efficient, yet specific recipe, a backdoor LoRA can be trained once and then seamlessly merged (in a training-free fashion) with multiple task-enhancing LoRAs, retaining both its malicious backdoor and benign downstream capabilities. This allows attackers to scale the distribution of compromised LoRAs with minimal effort by leveraging the rich pool of existing shared LoRA assets. We note that such merged LoRAs are particularly infectious -- because their malicious intent is cleverly concealed behind improved downstream capabilities, creating a strong incentive for voluntary download -- and dangerous -- because under local deployment, no safety measures exist to intervene when things go wrong. Our work is among the first to study this new threat model of training-free distribution of downstream-capable-yet-backdoor-injected LoRAs, highlighting the urgent need for heightened security awareness in the LoRA ecosystem. Warning: This paper contains offensive content and involves a real-life tragedy.

LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem

TL;DR

Abstract

Paper Structure (36 sections, 2 equations, 1 figure, 33 tables)

This paper contains 36 sections, 2 equations, 1 figure, 33 tables.

Introduction and Attack Setting
The Share-and-Play Ecosystem Enables Hassle-Free Enjoyment of Customized LLMs
A New Security Risk: LoRATK for Stealthy Backdoor Injection
LoRA Once, Backdoor Everywhere: Low-Cost Malicious Distribution at Scale
Background and Related Works
General Backdoor Attacks on LLMs
Backdoor Attacks Targeting the LoRA Share-and-Play Ecosystem
Threat Model
Attacker's Goal: Manufacturing Downstream-capable yet Backdoor-infected LoRAs at Scale.
Attacker's Access: Pretrained Base Model, Shared Downstream-improving LoRAs, and Backdoor Datasets.
Proposed Method
Potential Attack Recipes: From-scratch Mix-up vs Two-step Finetuning vs Training-free Merging
OB 1: Backdoors with Diversified Completions are More Merging-Friendly $\rightarrow$ Diversified Backdoor Completion Reconstruction
OB 2: Backdoor Capability Primarily Resides in the FF LoRA Module $\rightarrow$ FF-only Merge
OB 3: FF-only Merge Might Be Vulnerable to Flagging Defenses $\rightarrow$ 3-way Complement Merge
...and 21 more sections

Figures (1)

Figure 1: Overview of LoRATK in the Share-and-Play Scenario: (a) The attacker downloads existing downstream task-enhancing LoRAs from HuggingFace-like platforms, trains a backdoor-only LoRA, and then merges them together.(b) The merged malicious LoRA is redistributed via the LoRA sharing community, where users may voluntarily download them for improved downstream performance. (c) The merged malicious LoRA retains both downstream and backdoor capabilities.

LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem

TL;DR

Abstract

LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem

Authors

TL;DR

Abstract

Table of Contents

Figures (1)