Table of Contents
Fetching ...

PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement

Yian Wang, Han Yang, Minghao Guo, Xiaowen Qiu, Tsun-Hsuan Wang, Wojciech Matusik, Joshua B. Tenenbaum, Chuang Gan

TL;DR

PhyScensis, an LLM agent-based framework powered by a physics engine, to produce physically plausible scene configurations with high complexity, outperforms prior approaches in scene complexity, visual quality, and physical accuracy.

Abstract

Automatically generating interactive 3D environments is crucial for scaling up robotic data collection in simulation. While prior work has primarily focused on 3D asset placement, it often overlooks the physical relationships between objects (e.g., contact, support, balance, and containment), which are essential for creating complex and realistic manipulation scenarios such as tabletop arrangements, shelf organization, or box packing. Compared to classical 3D layout generation, producing complex physical scenes introduces additional challenges: (a) higher object density and complexity (e.g., a small shelf may hold dozens of books), (b) richer supporting relationships and compact spatial layouts, and (c) the need to accurately model both spatial placement and physical properties. To address these challenges, we propose PhyScensis, an LLM agent-based framework powered by a physics engine, to produce physically plausible scene configurations with high complexity. Specifically, our framework consists of three main components: an LLM agent iteratively proposes assets with spatial and physical predicates; a solver, equipped with a physics engine, realizes these predicates into a 3D scene; and feedback from the solver informs the agent to refine and enrich the configuration. Moreover, our framework preserves strong controllability over fine-grained textual descriptions and numerical parameters (e.g., relative positions, scene stability), enabled through probabilistic programming for stability and a complementary heuristic that jointly regulates stability and spatial relations. Experimental results show that our method outperforms prior approaches in scene complexity, visual quality, and physical accuracy, offering a unified pipeline for generating complex physical scene layouts for robotic manipulation.

PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement

TL;DR

PhyScensis, an LLM agent-based framework powered by a physics engine, to produce physically plausible scene configurations with high complexity, outperforms prior approaches in scene complexity, visual quality, and physical accuracy.

Abstract

Automatically generating interactive 3D environments is crucial for scaling up robotic data collection in simulation. While prior work has primarily focused on 3D asset placement, it often overlooks the physical relationships between objects (e.g., contact, support, balance, and containment), which are essential for creating complex and realistic manipulation scenarios such as tabletop arrangements, shelf organization, or box packing. Compared to classical 3D layout generation, producing complex physical scenes introduces additional challenges: (a) higher object density and complexity (e.g., a small shelf may hold dozens of books), (b) richer supporting relationships and compact spatial layouts, and (c) the need to accurately model both spatial placement and physical properties. To address these challenges, we propose PhyScensis, an LLM agent-based framework powered by a physics engine, to produce physically plausible scene configurations with high complexity. Specifically, our framework consists of three main components: an LLM agent iteratively proposes assets with spatial and physical predicates; a solver, equipped with a physics engine, realizes these predicates into a 3D scene; and feedback from the solver informs the agent to refine and enrich the configuration. Moreover, our framework preserves strong controllability over fine-grained textual descriptions and numerical parameters (e.g., relative positions, scene stability), enabled through probabilistic programming for stability and a complementary heuristic that jointly regulates stability and spatial relations. Experimental results show that our method outperforms prior approaches in scene complexity, visual quality, and physical accuracy, offering a unified pipeline for generating complex physical scene layouts for robotic manipulation.
Paper Structure (46 sections, 15 figures, 5 tables)

This paper contains 46 sections, 15 figures, 5 tables.

Figures (15)

  • Figure 1: We present PhyScensis, an agentic framework that incorporates a physics engine for physical scene arrangement. PhyScensis is: (a) capable of generating complex scenes with high object density and intricate physical interactions; (b) highly controllable, with with strong text-following abilities; and (c) adaptable to diverse, open-vocabulary scenarios.
  • Figure 2: Our framework consists of three components: (a) an LLM agent that takes a user prompt and generates spatial and physical predicates, along with object descriptions for retrieval; (b) a solver that computes the final scene using a physics engine for physical predicates and a sample-based constraint solver for spatial predicates; and (c) a feedback system that reports success or diagnoses failure, allowing the LLM agent to iteratively refine and regenerate predicates.
  • Figure 3: Examples of placements generated by physical solvers.
  • Figure 4: The stacking generation pipeline uses an occupancy-grid-based heuristic to efficiently compute candidate placement locations via grid search, which are then ranked by user requirements. A physics simulator verifies physical validity (e.g., whether an object will fall), and probabilistic programming further assesses stability, enabling control over the robustness of valid states.
  • Figure 5: Qualitative comparison of PhyScensis with baselines for different generating scenarios.
  • ...and 10 more figures