Table of Contents
Fetching ...

Quantization-Robust LLM Unlearning via Low-Rank Adaptation

João Vitor Boer Abitante, Joana Meneguzzo Pasquali, Luan Fonseca Garcia, Ewerton de Oliveira, Thomas da Silva Paula, Rodrigo C. Barros, Lucas S. Kupssinskü

TL;DR

This work addresses the challenge of unlearning targeted knowledge in LLMs when post-training quantization to low bit-widths (notably 4-bit) can erase unlearning updates. It proposes Quantization-Robust Unlearning via Low-Rank Adaptation (LoRA), which freezes the base model and concentrates updates in trainable low-rank adapters, enabling larger effective steps and stable updates despite quantization. Across the MUSE benchmark with Llama-2-7B, LoRA-based unlearning substantially improves 4-bit utility and reduces privacy leakage compared to full-parameter fine-tuning, with results showing up to a 7.93-point gain in 4-bit Utility for certain configurations and notable privacy improvements on BOOKS. The findings support deploying unlearning in resource-constrained settings by using LoRA adapters that are merged before quantization, offering practical robustness to PTQ while maintaining performance on retain data.

Abstract

Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization (PTQ) for efficient inference. However, aggressive low-bit PTQ can mask or erase unlearning updates, causing quantized models to revert to pre-unlearning behavior. We show that standard full-parameter fine-tuning often induce parameter changes that are too small to survive 4-bit quantization. We propose quantization-robust unlearning via low-rank adaptation (LoRA): we freeze the base model and concentrate unlearning into trainable adapters so that the effective update is preserved after quantization. On Llama-2-7B evaluated with MUSE dataset (BOOKS and NEWS), LoRA improves 4-bit utility by up to 7.93 points (NPO+GDR on BOOKS: 50.17 to 58.10) and yields higher 4-bit utility on NEWS for GA+GDR (40.06 to 44.82, increase of 4.76). LoRA also substantially reduces privacy leakage under 4-bit PTQ, e.g., for GA+KLR on BOOKS, PrivLeak moves from -25.68 to -5.86 (closer to ideal 0), while maintaining strong forgetting (VerMem and KnowMem near 0). Thus, using LoRA for Machine Unlearning is beneficial for scenarios where quantization is necessary for model deployment.

Quantization-Robust LLM Unlearning via Low-Rank Adaptation

TL;DR

This work addresses the challenge of unlearning targeted knowledge in LLMs when post-training quantization to low bit-widths (notably 4-bit) can erase unlearning updates. It proposes Quantization-Robust Unlearning via Low-Rank Adaptation (LoRA), which freezes the base model and concentrates updates in trainable low-rank adapters, enabling larger effective steps and stable updates despite quantization. Across the MUSE benchmark with Llama-2-7B, LoRA-based unlearning substantially improves 4-bit utility and reduces privacy leakage compared to full-parameter fine-tuning, with results showing up to a 7.93-point gain in 4-bit Utility for certain configurations and notable privacy improvements on BOOKS. The findings support deploying unlearning in resource-constrained settings by using LoRA adapters that are merged before quantization, offering practical robustness to PTQ while maintaining performance on retain data.

Abstract

Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization (PTQ) for efficient inference. However, aggressive low-bit PTQ can mask or erase unlearning updates, causing quantized models to revert to pre-unlearning behavior. We show that standard full-parameter fine-tuning often induce parameter changes that are too small to survive 4-bit quantization. We propose quantization-robust unlearning via low-rank adaptation (LoRA): we freeze the base model and concentrate unlearning into trainable adapters so that the effective update is preserved after quantization. On Llama-2-7B evaluated with MUSE dataset (BOOKS and NEWS), LoRA improves 4-bit utility by up to 7.93 points (NPO+GDR on BOOKS: 50.17 to 58.10) and yields higher 4-bit utility on NEWS for GA+GDR (40.06 to 44.82, increase of 4.76). LoRA also substantially reduces privacy leakage under 4-bit PTQ, e.g., for GA+KLR on BOOKS, PrivLeak moves from -25.68 to -5.86 (closer to ideal 0), while maintaining strong forgetting (VerMem and KnowMem near 0). Thus, using LoRA for Machine Unlearning is beneficial for scenarios where quantization is necessary for model deployment.
Paper Structure (15 sections, 3 equations, 2 tables)