Towards Adaptive Environment Generation for Training Embodied Agents

Teresa Yeo; Dulaj Weerakoon; Dulanga Weerakoon; Archan Misra

Towards Adaptive Environment Generation for Training Embodied Agents

Teresa Yeo, Dulaj Weerakoon, Dulanga Weerakoon, Archan Misra

TL;DR

This work presents a proof-of-concept for closed-loop environment generation that adapts difficulty to the agent's current capabilities, and implements a closed-loop adaptation mechanism that translates this feedback into environment modifications.

Abstract

Embodied agents struggle to generalize to new environments, even when those environments share similar underlying structures to their training settings. Most current approaches to generating these training environments follow an open-loop paradigm, without considering the agent's current performance. While procedural generation methods can produce diverse scenes, diversity without feedback from the agent is inefficient. The generated environments may be trivially easy, providing limited learning signal. To address this, we present a proof-of-concept for closed-loop environment generation that adapts difficulty to the agent's current capabilities. Our system employs a controllable environment representation, extracts fine-grained performance feedback beyond binary success or failure, and implements a closed-loop adaptation mechanism that translates this feedback into environment modifications. This feedback-driven approach generates training environments that more challenging in the ways the agent needs to improve, enabling more efficient learning and better generalization to novel settings.

Towards Adaptive Environment Generation for Training Embodied Agents

TL;DR

Abstract

Paper Structure (17 sections, 1 equation, 3 figures)

This paper contains 17 sections, 1 equation, 3 figures.

Introduction
Related Work
Environment Generation Approaches.
Open-loop Environment Generation.
Closed-loop environment generation
Method
Problem formulation.
Structured representation of the environment.
Fine-grained trajectory analysis as adaptation signal.
Adaptive environment generation.
Constrained optimization to ensure physical plausibility.
Experiments
Setup.
Preliminary results.
Limitations of frontier LLM Spatial Reasoning Capabilities.
...and 2 more sections

Figures (3)

Figure 1: Embodied navigation performance is sensitive to object perturbations. Top-down view of agent trajectories (yellow to orange path) for an object navigation task with the fridge as the target object. In the training environment (left), the agent successfully navigates to the target, while in the test environment (right) when the objects and furniture has been rearranged, the agent fails to reach the same target despite starting from the same position. Orange cross indicates target location. Our method analyzes such failure trajectories to generate difficulty-aware environments by programmatically modifying ProcTHOR scene structures, creating meaning and challenging scenarios that can be used to train and improve the agent. Best viewed on screen, zoomed in.
Figure 2: Overview of the proposed adaptive environment generation framework. Starting from an original environment $e_t$, an agent with policy $\pi_t$ is deployed to perform embodied tasks (e.g., object navigation), producing a top-down trajectory visualization $\tau^{e_t}$. An analysis model $F$ examines this trajectory to identify success/failure status, intermediate concerns (e.g., unsafe navigation margins), and high-level suggestions for curriculum design. The generator $G$ then translates this analysis $a_t$ into concrete scene graph modifications (e.g., moving or rotating objects) to create a perturbed environment $e_{t+1}$. The agent is retrained on this new environment with policy $\pi_{t+1}$, completing the feedback loop between adaptive environment generation and agent learning.
Figure 3: Collision-aware placements. When the generator $G$ proposes a new environment configuration that would result in object collisions, we apply a collision-aware adjustment to find a valid placement. In this example, the chair is initially proposed to move 20 units in the x-direction and 5 units in the y-direction, which would cause it to collide with the bed. We compute the displacement vector from the chair's current position to the proposed position and moves the chair incrementally along this direction. At each step, we check for collisions with other objects. This continues until either the proposed position is reached without collision or an obstacle blocks further movement along the path, at which point the chair is placed at the last valid collision-free position.

Towards Adaptive Environment Generation for Training Embodied Agents

TL;DR

Abstract

Towards Adaptive Environment Generation for Training Embodied Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (3)