DeLF: Designing Learning Environments with Foundation Models
Aida Afshar, Wenchao Li
TL;DR
The paper tackles the challenge of applying reinforcement learning in real-world tasks by addressing RL environment design, focusing on extracting good observation and action representations from user descriptions. It introduces DeLF, a method that leverages foundation models to design RL components and generate executable gym-like environment code through an Initiation-Communication-Evaluation (ICE) workflow. The authors formalize RL component design via a component extraction function, and demonstrate DeLF on four diverse tasks (Recommender System, Self-Driving Car, Swimmer, Key-Lock), producing runnable code after a small number of interactions. They also discuss extending to multimodal foundation models, refining evaluation metrics, and potential synergies with reward-design tools like Eureka, with all prompts and code made publicly available to encourage further development.
Abstract
Reinforcement learning (RL) offers a capable and intuitive structure for the fundamental sequential decision-making problem. Despite impressive breakthroughs, it can still be difficult to employ RL in practice in many simple applications. In this paper, we try to address this issue by introducing a method for designing the components of the RL environment for a given, user-intended application. We provide an initial formalization for the problem of RL component design, that concentrates on designing a good representation for observation and action space. We propose a method named DeLF: Designing Learning Environments with Foundation Models, that employs large language models to design and codify the user's intended learning scenario. By testing our method on four different learning environments, we demonstrate that DeLF can obtain executable environment codes for the corresponding RL problems.
