Pruning as a Defense: Reducing Memorization in Large Language Models
Mansi Gupta, Nikhar Waghela, Sarthak Gupta, Shourya Goel, Sanjif Shanmugavelu
TL;DR
This work tackles the privacy risk of memorization in large language models by adopting a prefix-based extraction framework with context lengths $k$ to quantify verbatim recall. It evaluates multiple pruning strategies (layer-wise, global, and attention-focused) on the Pythia model family to reduce memorization while tracking perplexity as a safety and quality metric. The findings show that pruning consistently reduces memorization across context lengths, with global pruning and attention-focused pruning providing the strongest defenses, albeit with a trade-off in language modeling performance. Overall, pruning emerges as a lightweight, practical baseline for mitigating membership inference, guiding future research into adaptive sparsity techniques for privacy-preserving LLMs.
Abstract
Large language models have been shown to memorize significant portions of their training data, which they can reproduce when appropriately prompted. This work investigates the impact of simple pruning techniques on this behavior. Our findings reveal that pruning effectively reduces the extent of memorization in LLMs, demonstrating its potential as a foundational approach for mitigating membership inference attacks.
