The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts
Warren Johnson
TL;DR
This work addresses why code prompts tolerate aggressive compression better than math-focused reasoning in LLMs by validating a task-dependent compression threshold across code and reasoning benchmarks, and by empirically confirming a perplexity paradox at the token level. It introduces TAAC, a Task-Aware Adaptive Compression algorithm that uses task type, token-density, and a quality predictor to dynamically adjust compression, achieving superior cost-quality tradeoffs over fixed-ratio baselines. The study demonstrates cross-benchmark generalization (e.g., MBPP), causal validation via signature preservation (reducing NameError from 86.1% to 6.1% and increasing pass rate by ~34pp), and robust mechanism through per-token perplexity analysis showing high-perplexity syntax tokens are retained while low-perplexity numerical values in math are pruned. The practical impact is a scalable approach to reducing inference costs in LLM deployments without sacrificing accuracy, with broader design implications for prompt compression and task-aware optimization.
Abstract
In "Compress or Route?" (Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r >= 0.6) while chain-of-thought reasoning degrades gradually. That study was limited to HumanEval (164 problems), left the "perplexity paradox" mechanism unvalidated, and provided no adaptive algorithm. This paper addresses all three gaps. First, we validate across six code benchmarks (HumanEval, MBPP, HumanEval+, MultiPL-E) and four reasoning benchmarks (GSM8K, MATH, ARC-Challenge, MMLU-STEM), confirming the compression threshold generalizes across languages and difficulties. Second, we conduct the first per-token perplexity analysis (n=723 tokens), revealing a "perplexity paradox": code syntax tokens are preserved (high perplexity) while numerical values in math problems are pruned despite being task-critical (low perplexity). Signature injection recovers +34 percentage points in pass rate (5.3% to 39.3%; Cohen's h=0.890). Third, we propose TAAC (Task-Aware Adaptive Compression), achieving 22% cost reduction with 96% quality preservation, outperforming fixed-ratio compression by 7%. MBPP validation (n=1,800 trials) confirms systematic variation: 3.6% at r=0.3 to 54.6% at r=1.0.
