Table of Contents
Fetching ...

Navigate Complex Physical Worlds via Geometrically Constrained LLM

Yongqiang Huang, Wentao Ye, Liyao Li, Junbo Zhao

TL;DR

This work innovatively explores the feasibility of using text-based LLMs as builders of the physical world and designs a workflow to enhance their spatial comprehension and construction capabilities.

Abstract

This study investigates the potential of Large Language Models (LLMs) for reconstructing and constructing the physical world solely based on textual knowledge. It explores the impact of model performance on spatial understanding abilities. To enhance the comprehension of geometric and spatial relationships in the complex physical world, the study introduces a set of geometric conventions and develops a workflow based on multi-layer graphs and multi-agent system frameworks. It examines how LLMs achieve multi-step and multi-objective geometric inference in a spatial environment using multi-layer graphs under unified geometric conventions. Additionally, the study employs a genetic algorithm, inspired by large-scale model knowledge, to solve geometric constraint problems. In summary, this work innovatively explores the feasibility of using text-based LLMs as physical world builders and designs a workflow to enhance their capabilities.

Navigate Complex Physical Worlds via Geometrically Constrained LLM

TL;DR

This work innovatively explores the feasibility of using text-based LLMs as builders of the physical world and designs a workflow to enhance their spatial comprehension and construction capabilities.

Abstract

This study investigates the potential of Large Language Models (LLMs) for reconstructing and constructing the physical world solely based on textual knowledge. It explores the impact of model performance on spatial understanding abilities. To enhance the comprehension of geometric and spatial relationships in the complex physical world, the study introduces a set of geometric conventions and develops a workflow based on multi-layer graphs and multi-agent system frameworks. It examines how LLMs achieve multi-step and multi-objective geometric inference in a spatial environment using multi-layer graphs under unified geometric conventions. Additionally, the study employs a genetic algorithm, inspired by large-scale model knowledge, to solve geometric constraint problems. In summary, this work innovatively explores the feasibility of using text-based LLMs as physical world builders and designs a workflow to enhance their capabilities.

Paper Structure

This paper contains 35 sections, 7 equations, 18 figures.

Figures (18)

  • Figure 1: The entire workflow is based on geometric conventions and relies on multiple agents to carry out 3D scene construction work around the graph. The user's demand information will be refined layer by layer by designers and used to generate object instances. Finally, the arranger will use the mapping from geometric constraints to deviations and a genetic algorithm solver to determine the correct placement position of the object.
  • Figure 2: GPT-4 produces complex structures and details and achieves better semantic alignment than GPT-3.5.
  • Figure 3: GPT-4 shows better spatial comprehension and multi-object scene generation than GPT-3.5, but still uses simple blocks with limited detail.
  • Figure 4: In object level generation tasks, the clip index of agents based on GPT-4 is generally better than ones based on GPT-3.5.
  • Figure 5: In the scenario level generation task, the clip index of GPT-4 group is 10.1% higher than that of GPT-3.5 group, and its isolation rate is much better than that of GPT-3.5 group.
  • ...and 13 more figures