Table of Contents
Fetching ...

TextResNet: Decoupling and Routing Optimization Signals in Compound AI Systems via Deep Residual Tuning

Suizhi Huang, Mei Li, Han Yu, Xiaoxiao Li

TL;DR

TextResNet tackles semantic entanglement and attribution ambiguity in Compound AI Systems by decoupling optimization signals and routing them precisely. It introduces four innovations—Additive Semantic Deltas, Semantic Projector, Causal Routing, and Density-Aware Scheduling—within a residual forward framework that preserves upstream context while enabling targeted local updates. Empirical results across four benchmarks show superior performance and deep-chain stability compared to TextGrad and baselines, along with reduced token usage. The approach offers a training-free architectural solution that improves reliability and efficiency in multi-agent AI systems, with clear pathways for further refinement.

Abstract

Textual Gradient-style optimizers (TextGrad) enable gradient-like feedback propagation through compound AI systems. However, they do not work well for deep chains. The root cause of this limitation stems from the Semantic Entanglement problem in these extended workflows. In standard textual backpropagation, feedback signals mix local critiques with upstream contexts, leading to Attribution Ambiguity. To address this challenge, we propose TextResNet, a framework that reformulates the optimization process to achieve precise signal routing via four key innovations. Firstly, in the forward pass, it enforces Additive Semantic Deltas to preserve an Identity Highway for gradient flow. Secondly, in the backward pass, it introduces Semantic Gradient Decomposition via a Semantic Projector to disentangle feedback into causally independent subspaces. Thirdly, it implements Causal Routing, which routes projected signals to their specific components. Finally, it performs Density-Aware Optimization Scheduling to leverage the disentangled signals to dynamically allocate resources to key system bottlenecks. Our results show that TextResNet not only achieves superior performance compared to TextGrad, but also exhibits remarkable stability for agentic tasks in compound AI systems where baselines collapse. Code is available at https://github.com/JeanDiable/TextResNet.

TextResNet: Decoupling and Routing Optimization Signals in Compound AI Systems via Deep Residual Tuning

TL;DR

TextResNet tackles semantic entanglement and attribution ambiguity in Compound AI Systems by decoupling optimization signals and routing them precisely. It introduces four innovations—Additive Semantic Deltas, Semantic Projector, Causal Routing, and Density-Aware Scheduling—within a residual forward framework that preserves upstream context while enabling targeted local updates. Empirical results across four benchmarks show superior performance and deep-chain stability compared to TextGrad and baselines, along with reduced token usage. The approach offers a training-free architectural solution that improves reliability and efficiency in multi-agent AI systems, with clear pathways for further refinement.

Abstract

Textual Gradient-style optimizers (TextGrad) enable gradient-like feedback propagation through compound AI systems. However, they do not work well for deep chains. The root cause of this limitation stems from the Semantic Entanglement problem in these extended workflows. In standard textual backpropagation, feedback signals mix local critiques with upstream contexts, leading to Attribution Ambiguity. To address this challenge, we propose TextResNet, a framework that reformulates the optimization process to achieve precise signal routing via four key innovations. Firstly, in the forward pass, it enforces Additive Semantic Deltas to preserve an Identity Highway for gradient flow. Secondly, in the backward pass, it introduces Semantic Gradient Decomposition via a Semantic Projector to disentangle feedback into causally independent subspaces. Thirdly, it implements Causal Routing, which routes projected signals to their specific components. Finally, it performs Density-Aware Optimization Scheduling to leverage the disentangled signals to dynamically allocate resources to key system bottlenecks. Our results show that TextResNet not only achieves superior performance compared to TextGrad, but also exhibits remarkable stability for agentic tasks in compound AI systems where baselines collapse. Code is available at https://github.com/JeanDiable/TextResNet.
Paper Structure (61 sections, 2 theorems, 15 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 61 sections, 2 theorems, 15 equations, 10 figures, 6 tables, 1 algorithm.

Key Result

Proposition 4.1

Let $\mathcal{H}$ be the semantic state space. In a deep chain of length $L$ employing additive semantic deltas, the upstream context $h_0$ remains theoretically accessible in the final state $h_L$. Unlike lossy rewriting chains where information decays exponentially with depth, our additive structu

Figures (10)

  • Figure 1: Optimization Problems in Compound AI Systems, an example through a simplified two-agent QA pipeline ( Info Retriever $\rightarrow$ Answer Generator). We identify three critical failure modes arising from Attribution Ambiguity in standard textual backpropagation. (a) Signal Blockage: Critical feedback fails to propagate to the upstream node. (b) Downstream Over-correction: Downstream nodes are forced to hallucinate fixes for upstream. (c) Upstream Pollution: Downstream reasoning errors mistakenly being concluded and leak to upstream nodes.
  • Figure 2: Overview of the TextResNet Framework. Our approach reformulates optimization as a structured semantic routing problem across three stages: (1) Forward Pass: Agents generate Additive Semantic Deltas ($\Delta$) rather than rewrites, establishing an Identity Highway that preserves upstream context for attribution. (2) Backward Pass: The Semantic Projector ($\mathcal{P}$) enforces Semantic Gradient Decomposition, projecting feedback into causally independent subspaces ($g^{\text{local}}, g^{\text{upstream}}$) to implement precise Causal Routing. (3) Optimization: A Density-Aware Scheduler tracks the accumulation of local errors (Gradient Density $\rho$) to dynamically allocate the optimization budget to true system bottlenecks via Boltzmann sampling.
  • Figure 3: Optimization Trajectories. The unrouted variant (Orange) removes Causal Routing but retains Density-Aware Scheduling; it stabilizes slowly due to noise accumulation. While the random variant (Green) keeps Causal Routing but removes Density-Aware Scheduling, it suffers from high variance. TextResNet (Blue) achieves stable and fast convergence.
  • Figure 4: Evolution of Error Attribution. We focus on the three downstream learnable components (InfoExtractor, HintGenerator, AnswerGenerator). It could be observed that the system transfer from propagating errors upstream (Early Stage) to resolving them locally (Late Stage), resembling curriculum learning process.
  • Figure 5: Attribution Accuracy under Batch Shuffling. We focus on the three downstream learnable components (InfoExtractor, HintGenerator, AnswerGenerator). We measure how often the optimizer correctly attributes the error to the shuffled input (Upstream) versus attempting to fix the prompt (Local).
  • ...and 5 more figures

Theorems & Definitions (7)

  • Definition 3.1: Semantic Attribution Ambiguity
  • Definition 3.2: Design Principle 1: Lossless Context Preservation
  • Definition 3.3: Design Principle 2: Semantic Disentanglement
  • Proposition 4.1: Information Preservation in Residual Chains
  • Proposition 4.2: Bounded Error Propagation Analysis
  • proof : Derivation
  • proof : Derivation