Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems
Minsoo Kim, Seung-won Hwang
TL;DR
The paper tackles hard-exploration for LLM agents by introducing GLoW, a dual-scale world-model framework that combines a global trajectory frontier for principled state selection with a local exploration strategy based on Multi-path Advantage Reflection (MAR). The global model extracts high-value patterns across discovered trajectories, while MAR densifies sparse rewards through semantic advantages at critical decision points, guided by an LLM-enabled policy. On the Jericho benchmark, GLoW achieves state-of-the-art results among LLM-based approaches and rivals RL-based methods while dramatically reducing environment interactions, demonstrating strong sample efficiency and robust exploration. This work highlights the value of coupling long-horizon, frontier-driven learning with local, advantage-based exploration signals to overcome sparse rewards in complex text-based environments.
Abstract
LLM-based agents have seen promising advances, yet they are still limited in "hard-exploration" tasks requiring learning new knowledge through exploration. We present GLoW, a novel approach leveraging dual-scale world models, maintaining a trajectory frontier of high-value discoveries at the global scale, while learning from local trial-and-error in exploration through a Multi-path Advantage Reflection mechanism which infers advantage-based progress signals to guide exploration. To evaluate our framework for hard-exploration, we tackle the Jericho benchmark suite of text-based games, where GLoW achieves a new state-of-theart performance for LLM-based approaches. Compared to state-of-the-art RLbased methods, our approach achieves comparable performance while requiring 100-800x fewer environment interactions.
