RAIL: Robot Affordance Imagination with Large Language Models
Ceng Zhang, Xin Meng, Dongchen Qi, Gregory S. Chirikjian
TL;DR
This work tackles the challenge of reasoning about affordances for unseen household objects with minimal input by introducing a three-stage framework—Affordance Analysis, Imagination Profile Generation, and Imagination Evaluation—that leverages Large Language Models to specify interaction-based definitions and physics-based simulation to ground feasible manipulations. The authors demonstrate strong performance, reporting 88.2% synthetic affordance classification and 92.7% functional pose accuracy, and validate the approach in real-world experiments with 18 unseen objects across 20 novel tasks, achieving 100% task success. By grounding language-driven imagination in physical simulation, the method enables robust generalization to a wide range of affordances without manual engineering. The findings suggest a practical, scalable pathway for real-world robot manipulation of novel objects, with potential extensions to articulated and deformable objects and more complex user commands.
Abstract
This paper introduces an automatic affordance reasoning paradigm tailored to minimal semantic inputs, addressing the critical challenges of classifying and manipulating unseen classes of objects in household settings. Inspired by human cognitive processes, our method integrates generative language models and physics-based simulators to foster analytical thinking and creative imagination of novel affordances. Structured with a tripartite framework consisting of analysis, imagination, and evaluation, our system "analyzes" the requested affordance names into interaction-based definitions, "imagines" the virtual scenarios, and "evaluates" the object affordance. If an object is recognized as possessing the requested affordance, our method also predicts the optimal pose for such functionality, and how a potential user can interact with it. Tuned on only a few synthetic examples across 3 affordance classes, our pipeline achieves a very high success rate on affordance classification and functional pose prediction of 8 classes of novel objects, outperforming learning-based baselines. Validation through real robot manipulating experiments demonstrates the practical applicability of the imagined user interaction, showcasing the system's ability to independently conceptualize unseen affordances and interact with new objects and scenarios in everyday settings.
