Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation

Hadi Nekoei; Aman Jaiswal; Patrice Bechard; Oleh Shliazhko; Orlando Marquez Ayala; Mathieu Reymond; Massimo Caccia; Alexandre Drouin; Sarath Chandar; Alexandre Lacoste

Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation

Hadi Nekoei, Aman Jaiswal, Patrice Bechard, Oleh Shliazhko, Orlando Marquez Ayala, Mathieu Reymond, Massimo Caccia, Alexandre Drouin, Sarath Chandar, Alexandre Lacoste

TL;DR

The paper tackles the challenge of enhancing LLM agents in unfamiliar domains without costly online interactions or fine-tuning by distilling offline trajectories into lightweight, context-aware hints. It introduces Just-in-time Episodic Feedback Hinter (JEF Hinter), which uses a zooming module to identify decisive decision points and a reflection step to generate concise hints, capable of leveraging both successful and failed traces. At inference, a retriever selects relevant hints to condition the agent's actions, enabling targeted guidance with transparency and no additional training. Empirical results on MiniWoB++, WorkArena-L1, and WebArena-Lite show that JEF Hinter outperforms strong baselines, including document- and human-based hints, and demonstrates robust generalization to unseen tasks and goals.

Abstract

Large language model (LLM) agents perform well in sequential decision-making tasks, but improving them on unfamiliar domains often requires costly online interactions or fine-tuning on large expert datasets. These strategies are impractical for closed-source models and expensive for open-source ones, with risks of catastrophic forgetting. Offline trajectories offer reusable knowledge, yet demonstration-based methods struggle because raw traces are long, noisy, and tied to specific tasks. We present Just-in-time Episodic Feedback Hinter (JEF Hinter), an agentic system that distills offline traces into compact, context-aware hints. A zooming mechanism highlights decisive steps in long trajectories, capturing both strategies and pitfalls. Unlike prior methods, JEF Hinter leverages both successful and failed trajectories, extracting guidance even when only failure data is available, while supporting parallelized hint generation and benchmark-independent prompting. At inference, a retriever selects relevant hints for the current state, providing targeted guidance with transparency and traceability. Experiments on MiniWoB++, WorkArena-L1, and WebArena-Lite show that JEF Hinter consistently outperforms strong baselines, including human- and document-based hints.

Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation

TL;DR

Abstract

Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)