AERO: Entropy-Guided Framework for Private LLM Inference

Nandan Kumar Jha; Brandon Reagen

AERO: Entropy-Guided Framework for Private LLM Inference

Nandan Kumar Jha, Brandon Reagen

TL;DR

AERO addresses the challenge of private LLM inference by reducing costly nonlinearities through an entropy-guided framework. It combines an inference-time LayerNorm substitute with an adaptive, per-head entropy regularizer that uses learnable thresholds and a tolerance margin to prevent entropic overload while preserving head diversity. Empirical results show substantial communication and latency savings (≈3.4× and 1.4×, respectively) with no degradation in perplexity, and improvements up to 6–8% in the most constrained Softmax-only settings. This approach provides practical gains for privacy-preserving inference and offers a principled design path for scalable, normalization-free LLM architectures.

Abstract

Privacy-preserving computation enables language model inference directly on encrypted data yet suffers from prohibitive latency and communication overheads, primarily due to nonlinear functions. Removing nonlinearities, however, can trigger one of two failure modes restricting the potential for nonlinearity removal: entropy collapse in deeper layers, which destabilizes training, and entropic overload in early layers, causing under-utilization of attention heads. To address these challenges, we introduce AERO, an entropy-guided framework to strategically eliminates costly nonlinear operations from transformer architectures, which employs an adaptive recalibration through a head-wise entropy regularizer with learnable per-head strengths, enabling each head to adjust its entropy level while penalizing extreme entropies and fostering functional diversity through a tolerance margin. Experiments show AERO can save 3.4$\times$ communication and 1.4$\times$ latency, without any performance penalty.

AERO: Entropy-Guided Framework for Private LLM Inference

TL;DR

Abstract

AERO: Entropy-Guided Framework for Private LLM Inference

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)