Table of Contents
Fetching ...

Bifrost: Steering Strategic Trajectories to Bridge Contextual Gaps for Self-Improving Agents

Quan M. Tran, Zhuo Huang, Wenbin Zhang, Bo Han, Koji Yatani, Masashi Sugiyama, Tongliang Liu

TL;DR

Bifrost tackles cross-context failures in self-improving agents by revealing a context-trajectory correlation in the latent representation and introducing a training-free trajectory steering mechanism. By computing a context shift from past trajectories to the target task and applying a targeted update to hidden states, Bifrost aligns prior experiences with new contexts within a shared representation, enabling effective in-context learning without fine-tuning. The authors ground their approach in a Bayesian interpretation of in-context learning, derive a Laplace-based posterior shift, and prove a favorable excess risk bound, while empirical results across math reasoning, QA, and code generation demonstrate strong, training-free generalization under substantial context shifts. Overall, Bifrost offers a practical, scalable means to leverage historical trajectories for robust cross-domain self-improvement with theoretical backing and broad empirical validation.

Abstract

Autonomous agents excel in self-improvement through reflection and iterative refinement, which reuse successful task trajectories as in-context examples to assist subsequent reasoning. However, shifting across tasks often introduces a context mismatch. Hence, existing approaches either discard the trajectories or manipulate them using heuristics, leading to a non-negligible fine-tuning cost or unguaranteed performance. To bridge this gap, we reveal a context-trajectory correlation, where shifts of context are highly parallel with shifts of trajectory. Based on this finding, we propose BrIdge contextual gap FoR imprOvised trajectory STeering (Bifrost), a training-free method that leverages context differences to precisely guide the adaptation of previously solved trajectories towards the target task, mitigating the misalignment caused by context shifts. Our trajectory adaptation is conducted at the representation level using agent hidden states, ensuring trajectory transformation accurately aligns with the target context in a shared space. Across diverse benchmarks, Bifrost consistently outperforms existing trajectory reuse and finetuned self-improvement methods, demonstrating that agents can effectively leverage past experiences despite substantial context shifts.

Bifrost: Steering Strategic Trajectories to Bridge Contextual Gaps for Self-Improving Agents

TL;DR

Bifrost tackles cross-context failures in self-improving agents by revealing a context-trajectory correlation in the latent representation and introducing a training-free trajectory steering mechanism. By computing a context shift from past trajectories to the target task and applying a targeted update to hidden states, Bifrost aligns prior experiences with new contexts within a shared representation, enabling effective in-context learning without fine-tuning. The authors ground their approach in a Bayesian interpretation of in-context learning, derive a Laplace-based posterior shift, and prove a favorable excess risk bound, while empirical results across math reasoning, QA, and code generation demonstrate strong, training-free generalization under substantial context shifts. Overall, Bifrost offers a practical, scalable means to leverage historical trajectories for robust cross-domain self-improvement with theoretical backing and broad empirical validation.

Abstract

Autonomous agents excel in self-improvement through reflection and iterative refinement, which reuse successful task trajectories as in-context examples to assist subsequent reasoning. However, shifting across tasks often introduces a context mismatch. Hence, existing approaches either discard the trajectories or manipulate them using heuristics, leading to a non-negligible fine-tuning cost or unguaranteed performance. To bridge this gap, we reveal a context-trajectory correlation, where shifts of context are highly parallel with shifts of trajectory. Based on this finding, we propose BrIdge contextual gap FoR imprOvised trajectory STeering (Bifrost), a training-free method that leverages context differences to precisely guide the adaptation of previously solved trajectories towards the target task, mitigating the misalignment caused by context shifts. Our trajectory adaptation is conducted at the representation level using agent hidden states, ensuring trajectory transformation accurately aligns with the target context in a shared space. Across diverse benchmarks, Bifrost consistently outperforms existing trajectory reuse and finetuned self-improvement methods, demonstrating that agents can effectively leverage past experiences despite substantial context shifts.
Paper Structure (55 sections, 3 theorems, 36 equations, 11 figures, 13 tables, 1 algorithm)

This paper contains 55 sections, 3 theorems, 36 equations, 11 figures, 13 tables, 1 algorithm.

Key Result

Theorem 2

Under the Linear Representation Hypothesis, for a concept $W$ satisfying the conditions in Assumption assumption:conceptual_knowledge, for an embedding vector $\bar{\mathbf{h}}_W$ and unembedding vector $\bar{\mathbf{g}}_W$ of $W$, the change of output logits is a linear function of the magnitude of

Figures (11)

  • Figure 1: Illustration of Bifrost: It identifies the context-shift concept between previous and target tasks, which contains essential knowledge for bridging the context gap. Further, by steering along the context shift direction, Bifrost helps find an optimal strategy that is effective under the target context.
  • Figure 2: Success of latent concept transfer by Bifrost: it correctly solves a GSM8K task with algebra knowledge leveraged from AQUA trajectories, demonstrating problem decomposition and symbolic manipulation.
  • Figure 3: Illustration of Latent Concept Shift by Bifrost: It steers prior context trajectories toward the target context of current tasks under agent hidden state representations.
  • Figure 4: Bifrost effectiveness when leveraging different numbers of in-context examples.
  • Figure 5: Bifrost with different steering layer positions.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Theorem 2
  • proof
  • Lemma 1
  • Theorem 3