Table of Contents
Fetching ...

Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs

Daniel Agyei Asante, Md Mokarram Chowdhury, Yang Li

TL;DR

This paper systematically investigates how low-rank factorization affects LLM trustworthiness across privacy, adversarial robustness, fairness, and ethics, using LLaMA2 and Qwen models compressed with SVD, Basel, and FWSVD. It reveals nuanced trade-offs: compression can reduce training-data leakage and, in many cases, preserve or enhance adversarial robustness, but it can increase PII leakage under adversarial prompting and degrade fairness and zero-shot ethics. The authors also examine how model scale and fine-tuning interact with compression, and provide a layer-wise attribution analysis identifying embed_tokens and down_proj as influential for trustworthiness. Overall, the work emphasizes the need to evaluate compression through a trustworthiness lens and proposes attribution-based guidance for safer, more reliable low-rank LLM deployment.

Abstract

Large language models (LLMs) have driven major advances across domains, yet their massive size hinders deployment in resource-constrained settings. Model compression addresses this challenge, with low-rank factorization emerging as a particularly effective method for reducing size, memory, and computation while maintaining accuracy. However, while these compressed models boast of benign performance and system-level advantages, their trustworthiness implications remain poorly understood. In this paper, we present the first comprehensive study of how low-rank factorization affects LLM trustworthiness across privacy, adversarial robustness, fairness, and ethical alignment. We evaluate multiple LLMs of different sizes and variants compressed with diverse low-rank algorithms, revealing key insights: (1) low-rank compression preserves or improves training data privacy but weakens PII protection during conversation; (2) adversarial robustness is generally preserved and often enhanced, even under deep compression; (3) ethical reasoning degrades in zero-shot settings but partially recovers with few-shot prompting; (4) fairness declines under compression. Beyond compression, we investigate how model scale and fine-tuning affect trustworthiness, as both are important in low-rank methods. To guide trustworthy compression strategies, we end our paper with a gradient-based attribution analysis to identify which layers in LLMs contribute most to adversarial robustness.

Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs

TL;DR

This paper systematically investigates how low-rank factorization affects LLM trustworthiness across privacy, adversarial robustness, fairness, and ethics, using LLaMA2 and Qwen models compressed with SVD, Basel, and FWSVD. It reveals nuanced trade-offs: compression can reduce training-data leakage and, in many cases, preserve or enhance adversarial robustness, but it can increase PII leakage under adversarial prompting and degrade fairness and zero-shot ethics. The authors also examine how model scale and fine-tuning interact with compression, and provide a layer-wise attribution analysis identifying embed_tokens and down_proj as influential for trustworthiness. Overall, the work emphasizes the need to evaluate compression through a trustworthiness lens and proposes attribution-based guidance for safer, more reliable low-rank LLM deployment.

Abstract

Large language models (LLMs) have driven major advances across domains, yet their massive size hinders deployment in resource-constrained settings. Model compression addresses this challenge, with low-rank factorization emerging as a particularly effective method for reducing size, memory, and computation while maintaining accuracy. However, while these compressed models boast of benign performance and system-level advantages, their trustworthiness implications remain poorly understood. In this paper, we present the first comprehensive study of how low-rank factorization affects LLM trustworthiness across privacy, adversarial robustness, fairness, and ethical alignment. We evaluate multiple LLMs of different sizes and variants compressed with diverse low-rank algorithms, revealing key insights: (1) low-rank compression preserves or improves training data privacy but weakens PII protection during conversation; (2) adversarial robustness is generally preserved and often enhanced, even under deep compression; (3) ethical reasoning degrades in zero-shot settings but partially recovers with few-shot prompting; (4) fairness declines under compression. Beyond compression, we investigate how model scale and fine-tuning affect trustworthiness, as both are important in low-rank methods. To guide trustworthy compression strategies, we end our paper with a gradient-based attribution analysis to identify which layers in LLMs contribute most to adversarial robustness.

Paper Structure

This paper contains 50 sections, 9 figures, 20 tables.

Figures (9)

  • Figure 1: Illustrative examples of prompts and responses generated by the LLaMA2-13B model across three perspectives: (1) leakage of in-context data and PII, (2) vulnerability to adversarial prompts, and (3) machine ethics violations. Across all cases, the model exhibits consistent trustworthiness breakdowns.
  • Figure 2: Throughput and memory of LLaMA-2 7B and its low-rank compressed models using SVD, FWSVD, and Basel. Left (a): Throughput (tokens/sec) increases as model size decreases, showing improved efficiency with compression. Right (b): GPU memory usage (in MiB) also decreases with compression, confirming the effectiveness of low-rank approximations for resource-constrained deployment.
  • Figure 3: Interaction between the black-box adversary, honest-but-curious user, and the target LLM.
  • Figure 4: Training leakage of models across different context lengths (L=50, 100, and 200).
  • Figure 5: Adversarial robustness of Base 7B, 13B and Base 13B compressed models. Base 13B is the baseline. Refer to Tables \ref{['tab:compression_vs_Base']} and \ref{['tab:adv_robust_7b']} for exact values.
  • ...and 4 more figures