Prompt Compression in Production Task Orchestration: A Pre-Registered Randomized Trial

Warren Johnson; Charles Lee

Prompt Compression in Production Task Orchestration: A Pre-Registered Randomized Trial

Warren Johnson, Charles Lee

Abstract

The economics of prompt compression depend not only on reducing input tokens but on how compression changes output length, which is typically priced several times higher. We evaluate this in a pre-registered six-arm randomized controlled trial of prompt compression on production multi-agent task-orchestration, analyzing 358 successful Claude Sonnet 4.5 runs (59-61 per arm) drawn from a randomized corpus of 1,199 real orchestration instructions. We compare an uncompressed control with three uniform retention rates (r=0.8, 0.5, 0.2) and two structure-aware strategies (entropy-adaptive and recency-weighted), measuring total inference cost (input+output) and embedding-based response similarity. Moderate compression (r=0.5) reduced mean total cost by 27.9%, while aggressive compression (r=0.2) increased mean cost by 1.8% despite substantial input reduction, consistent with small mean output expansion (1.03x vs. control) and heavy-tailed uncertainty. Recency-weighted compression achieved 23.5% savings and, together with moderate compression, occupied the empirical cost-similarity Pareto frontier, whereas aggressive compression was dominated on both cost and similarity. These results show that "compress more" is not a reliable production heuristic and that output tokens must be treated as a first-class outcome when designing compression policies.

Prompt Compression in Production Task Orchestration: A Pre-Registered Randomized Trial

Abstract

Paper Structure (64 sections, 10 equations, 1 figure, 9 tables)

This paper contains 64 sections, 10 equations, 1 figure, 9 tables.

Introduction
Background: The TAAC Research Program
The Ecological Validity Problem
Prompt Compression: Theoretical Foundations and Practical Advances
The Output Token Dynamics Gap
Multi-Agent Task Orchestration: A New Frontier for Compression
Research Questions and Hypotheses
Methods
Experimental Design
Corpus Preparation
Data Sources
Inclusion and Exclusion Criteria
Corpus Descriptive Statistics
Randomization
Treatment Implementation
...and 49 more sections

Figures (1)

Figure 1: CONSORT flow diagram. Enrollment: 1,577 records $\rightarrow$ 1,337 after exclusions $\rightarrow$ 1,199 after deduplication and randomization. All 1,199 trials were submitted to the API; 841 failed after retries (primarily credit-balance errors), leaving a complete-case inferential set of 358 successful trials (59--61 per arm).

Prompt Compression in Production Task Orchestration: A Pre-Registered Randomized Trial

Abstract

Prompt Compression in Production Task Orchestration: A Pre-Registered Randomized Trial

Authors

Abstract

Table of Contents

Figures (1)