Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs
Daniel Agyei Asante, Md Mokarram Chowdhury, Yang Li
TL;DR
This paper systematically investigates how low-rank factorization affects LLM trustworthiness across privacy, adversarial robustness, fairness, and ethics, using LLaMA2 and Qwen models compressed with SVD, Basel, and FWSVD. It reveals nuanced trade-offs: compression can reduce training-data leakage and, in many cases, preserve or enhance adversarial robustness, but it can increase PII leakage under adversarial prompting and degrade fairness and zero-shot ethics. The authors also examine how model scale and fine-tuning interact with compression, and provide a layer-wise attribution analysis identifying embed_tokens and down_proj as influential for trustworthiness. Overall, the work emphasizes the need to evaluate compression through a trustworthiness lens and proposes attribution-based guidance for safer, more reliable low-rank LLM deployment.
Abstract
Large language models (LLMs) have driven major advances across domains, yet their massive size hinders deployment in resource-constrained settings. Model compression addresses this challenge, with low-rank factorization emerging as a particularly effective method for reducing size, memory, and computation while maintaining accuracy. However, while these compressed models boast of benign performance and system-level advantages, their trustworthiness implications remain poorly understood. In this paper, we present the first comprehensive study of how low-rank factorization affects LLM trustworthiness across privacy, adversarial robustness, fairness, and ethical alignment. We evaluate multiple LLMs of different sizes and variants compressed with diverse low-rank algorithms, revealing key insights: (1) low-rank compression preserves or improves training data privacy but weakens PII protection during conversation; (2) adversarial robustness is generally preserved and often enhanced, even under deep compression; (3) ethical reasoning degrades in zero-shot settings but partially recovers with few-shot prompting; (4) fairness declines under compression. Beyond compression, we investigate how model scale and fine-tuning affect trustworthiness, as both are important in low-rank methods. To guide trustworthy compression strategies, we end our paper with a gradient-based attribution analysis to identify which layers in LLMs contribute most to adversarial robustness.
