Table of Contents
Fetching ...

Affordances Enable Partial World Modeling with LLMs

Khimya Khetarpal, Gheorghe Comanici, Jonathan Richens, Jeremy Shar, Fei Xia, Laurent Orseau, Aleksandra Faust, Doina Precup

TL;DR

Full world models for multi-task planning are costly and brittle; this work investigates using large language models as partial world models guided by affordances. It develops a Generalized Affordance Framework that separates task-agnostic and task-specific intents, introduces distribution-robust affordances, and proves that achieving language-conditioned intents yields predictive partial world models while enabling provably faster search. Theoretical results (Theorem 1 and Theorem 2) link intents to partial world models and to planning efficiency, and empirical validation on tabletop robotics shows that affordance-informed partial models reduce search branching and improve rewards compared to full models. The approach offers a scalable path to efficient, transferable planning in multi-task robotic domains by constraining dynamics to relevant, robust affordances discovered via LLMs.

Abstract

Full models of the world require complex knowledge of immense detail. While pre-trained large models have been hypothesized to contain similar knowledge due to extensive pre-training on vast amounts of internet scale data, using them directly in a search procedure is inefficient and inaccurate. Conversely, partial models focus on making high quality predictions for a subset of state and actions: those linked through affordances that achieve user intents~\citep{khetarpal2020can}. Can we posit large models as partial world models? We provide a formal answer to this question, proving that agents achieving task-agnostic, language-conditioned intents necessarily possess predictive partial-world models informed by affordances. In the multi-task setting, we introduce distribution-robust affordances and show that partial models can be extracted to significantly improve search efficiency. Empirical evaluations in tabletop robotics tasks demonstrate that our affordance-aware partial models reduce the search branching factor and achieve higher rewards compared to full world models.

Affordances Enable Partial World Modeling with LLMs

TL;DR

Full world models for multi-task planning are costly and brittle; this work investigates using large language models as partial world models guided by affordances. It develops a Generalized Affordance Framework that separates task-agnostic and task-specific intents, introduces distribution-robust affordances, and proves that achieving language-conditioned intents yields predictive partial world models while enabling provably faster search. Theoretical results (Theorem 1 and Theorem 2) link intents to partial world models and to planning efficiency, and empirical validation on tabletop robotics shows that affordance-informed partial models reduce search branching and improve rewards compared to full models. The approach offers a scalable path to efficient, transferable planning in multi-task robotic domains by constraining dynamics to relevant, robust affordances discovered via LLMs.

Abstract

Full models of the world require complex knowledge of immense detail. While pre-trained large models have been hypothesized to contain similar knowledge due to extensive pre-training on vast amounts of internet scale data, using them directly in a search procedure is inefficient and inaccurate. Conversely, partial models focus on making high quality predictions for a subset of state and actions: those linked through affordances that achieve user intents~\citep{khetarpal2020can}. Can we posit large models as partial world models? We provide a formal answer to this question, proving that agents achieving task-agnostic, language-conditioned intents necessarily possess predictive partial-world models informed by affordances. In the multi-task setting, we introduce distribution-robust affordances and show that partial models can be extracted to significantly improve search efficiency. Empirical evaluations in tabletop robotics tasks demonstrate that our affordance-aware partial models reduce the search branching factor and achieve higher rewards compared to full world models.
Paper Structure (18 sections, 5 theorems, 47 equations, 5 figures, 2 tables)

This paper contains 18 sections, 5 theorems, 47 equations, 5 figures, 2 tables.

Key Result

Theorem 1

Let $\pi$ be the policy of a deterministic $(n,\zeta, \delta)$-optimal agent (Def. def:agent), in an environment satisfying Assumption ass:communicatingCMDP. $\pi$ encodes a partial world model $\hat{P}_{\mathrm{par}}(s'\mid o, s)$, and the worst case error for this world model is bounded by, $\forall (s,o) \in {{\mathcal{A} \mathcal{F}}_{{\cal I}}}$, where

Figures (5)

  • Figure 1: Generalizing the concept of affordances to a multi-task setting is achieved by categorizing agent intents as task-agnostic—grounded in the agent's embodiment, such as a robot's ability to pick or place objects—and task-specific, which depend on the environment and task distribution. By considering the three primary axes of agent, environment, and task distribution, the interplay between these intents facilitates the emergence of affordances at the boundary of the agent-environment interaction. This framework induces partial world models that effectively generalize across tasks, resulting in provably efficient planning.
  • Figure 2: Characterizing intents as task-agnostic and task-specific in the context of multi-task setting. For an agent embodiment, a task agnostic intent is satisfied to a degree $\zeta$ across the entire task distribution as shown on the left, whereas, a task specific intent doesn't have the same coverage of satisfiability across the task-distribution as its specific to certain tasks only.
  • Figure 3: Improvements in affordance model accuracy results in better performance of the search policy. Given a fixed budget for search, a perfect world model, any errors in affordance prediction translates to catastrophic failures in the search policy.
  • Figure 4: Monte-Carlo sampling with a partial model considers expanding the tree is only on affordances. However, search with a full model would expand all possible state-action pairs.
  • Figure 5: Our Approach leverages LLMs as partial world models using affordances as depicted here. Affordance-informed partial models are then used to guide MCTS in table top robotics task. We posit LLMs as-is to be generative world models, which we refer to as full world models. Partial Models are induced via affordances generated by LLMs using task-agnostic intents specified in the prompt.

Theorems & Definitions (8)

  • Theorem 1: Distribution-Robust Affordances induce Partial World Models
  • Theorem 2: Provably Efficient Search with a Partial Model
  • Theorem 2: Distribution-Robust Affordances induce Partial World Models
  • proof
  • Lemma 1
  • proof
  • Theorem 2: Provably Efficient Search with a Partial Model
  • proof