Table of Contents
Fetching ...

Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation

Chenxing Wei, Hong Wang, Ying He, Zhongxiang Dai, Bo Jiang, F. Richard Yu, Yao Shu

TL;DR

ROSA2 is proposed, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights, and it is proved that this co-adaptation strictly reduces the required parameter shift for convergence.

Abstract

Test-time policy adaptation for multi-turn interactions (T2PAM) is essential for aligning Large Language Models (LLMs) with dynamic user needs during inference time. However, existing paradigms commonly treat test-time adaptation as a single-axis problem, either purely refining instructions (Prompt Engineering) or only adjusting weights (Test-Time Training), ignoring that interaction failures stem from a coupled mix of ambiguity and incapacity. We argue that these two optimization paths are not merely additive but synergistic: semantic clarity acts as a pre-conditioner for effective parameter updates. To this end, we propose ROSA2, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights. By mathematically decomposing the error signal, ROSA2 utilizes textual gradients to rectify intent ambiguity and parameter updates to bridge capability gaps. Theoretically, we prove that this co-adaptation strictly reduces the required parameter shift for convergence. Empirically, ROSA2 outperforms state-of-the-art baselines by 30% on MATH while reducing interaction turns by 40%, demonstrating that refining the context unlocks the true potential of parameter updates.

Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation

TL;DR

ROSA2 is proposed, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights, and it is proved that this co-adaptation strictly reduces the required parameter shift for convergence.

Abstract

Test-time policy adaptation for multi-turn interactions (T2PAM) is essential for aligning Large Language Models (LLMs) with dynamic user needs during inference time. However, existing paradigms commonly treat test-time adaptation as a single-axis problem, either purely refining instructions (Prompt Engineering) or only adjusting weights (Test-Time Training), ignoring that interaction failures stem from a coupled mix of ambiguity and incapacity. We argue that these two optimization paths are not merely additive but synergistic: semantic clarity acts as a pre-conditioner for effective parameter updates. To this end, we propose ROSA2, a framework that reformulates interaction as a joint optimization problem over the heterogeneous space of Words and Weights. By mathematically decomposing the error signal, ROSA2 utilizes textual gradients to rectify intent ambiguity and parameter updates to bridge capability gaps. Theoretically, we prove that this co-adaptation strictly reduces the required parameter shift for convergence. Empirically, ROSA2 outperforms state-of-the-art baselines by 30% on MATH while reducing interaction turns by 40%, demonstrating that refining the context unlocks the true potential of parameter updates.
Paper Structure (43 sections, 2 theorems, 17 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 43 sections, 2 theorems, 17 equations, 4 figures, 6 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $\Delta \theta_t(\mathbf{x})$ be the solution to the linearized parameter update defined in Eq. (6) of ROSA wei2025testtime given a query $\mathbf{x}$. If we successfully updates the query from $\mathbf{x}_t$ to $\mathbf{x}_t^*$ such that the semantic gap to the user intent is reduced (i.e., $D_

Figures (4)

  • Figure 1: Overview of the ROSA2 Framework. We formulate T$^2$PAM as a joint optimization problem over the coupled variables $\phi_t = \{x_{t+1}, \theta_t\}$. During the Forward Phase (solid lines), the model generates a response $y_t$ conditioned on the history $H_{t-1}$. The Backward Phase (dashed lines) approximates the full gradient $\nabla_{joint}$ of the interaction loss $\mathcal{L}$ via two synergistic modules: the Textual Optimization (top, green) utilizes textual gradients ($\nabla_{\mathrm{x}}$) to refine the user feedback into a clearer instruction ($x_{t+1} \rightarrow x_{t+1}^*$), resolving context ambiguity; while the Parameter Optimization (bottom, blue) employs gradient updates ($\nabla_{\theta}$) to adjust the adapter weights ($\theta_t \rightarrow \theta_{t+1}$), enhancing the intrinsic capability of model. This co-adaptation ensures the system becomes both "Clearer" in intent and "Stronger" in execution for the next turn.
  • Figure 2: Empirical Observations and Theoretical Landscape.Figure (a) In the experimental results on MATH (Qwen3-8B) reveal that single-axis methods (Green/Blue solid lines) suffer from premature stagnation. However, the immediate recovery observed in the Switch experiments (Green/Blue dashed lines) suggests this bottleneck is structural. Figure (b) We map these dynamics to the optimization landscape using consistent color and line styling: the Prompt-Only path (Green) stalls in the Deficit Trap (Hitting capability ceilings), while the Param-Only path (Blue) gravitates towards the Overfitting Trap (Memorizing noise). The dashed arrows in Figure (b) visualize how the Switch Method escapes these local minima by activating the missing axis. Crucially, ROSA2 (Red) approximates the joint gradient $\nabla_{\text{joint}}$, forming an Optimal Trajectory that bypasses these traps and proceeds directly to the Success Zone, corresponding to the superior convergence shown in Figure (a).
  • Figure 3: Dynamics of approximation error terms. The plot compares the baseline parametric error (gray) against the decomposed errors of ROSA2. The parametric error of ROSA2 (blue) is significantly reduced compared to the baseline, verifying Theorem \ref{['thm:error_reduction']}. Furthermore, the total error of ROSA2 (red) remains lower than the baseline despite the additional semantic cost (green), verifying Theorem \ref{['thm:unified_bound']}, which decays exponentially.
  • Figure 4: Performance trajectory on challenging benchmarks. We plot the accuracy on AIME25, GPQA-Diamond, M_IMO, and BigCodeBench-Hard as a function of interaction turns. ROSA2 (red line) demonstrates sustained accuracy improvements, successfully solving complex problems where baselines plateau.

Theorems & Definitions (2)

  • Theorem 4.1: Reduction of Parameter Shift
  • Theorem 4.2: Unified Convergence Bound