Table of Contents
Fetching ...

Efficient Learning of Object Placement with Intra-Category Transfer

Adrian Röfer, Russell Buchanan, Max Argus, Sethu Vijayakumar, Abhinav Valada

TL;DR

This work tackles efficient, few-shot learning of long-horizon object placement by transferring observed arrangements to novel instances within a canonical class frame. It introduces canonical class mappings, relative pose distributions in learned feature spaces, and an entropy-guided pose-encoding with model minimization to suppress distractors, enabling intra-category transfer. The approach yields strong simulated performance and demonstrates real-world table-setting with unseen objects, achieving 73.3% of human baseline in human evaluations. The authors also provide a perception-driven real-robot pipeline and release code and datasets publicly, underscoring practical viability for autonomous arrangement tasks.

Abstract

Efficient learning from demonstration for long-horizon tasks remains an open challenge in robotics. While significant effort has been directed toward learning trajectories, a recent resurgence of object-centric approaches has demonstrated improved sample efficiency, enabling transferable robotic skills. Such approaches model tasks as a sequence of object poses over time. In this work, we propose a scheme for transferring observed object arrangements to novel object instances by learning these arrangements on canonical class frames. We then employ this scheme to enable a simple yet effective approach for training models from as few as five demonstrations to predict arrangements of a wide range of objects including tableware, cutlery, furniture, and desk spaces. We propose a method for optimizing the learned models to enable efficient learning of tasks such as setting a table or tidying up an office with intra-category transfer, even in the presence of distractors. We present extensive experimental results in simulation and on a real robotic system for table setting which, based on human evaluations, scored 73.3% compared to a human baseline. We make the code and trained models publicly available at https://oplict.cs.uni-freiburg.de.

Efficient Learning of Object Placement with Intra-Category Transfer

TL;DR

This work tackles efficient, few-shot learning of long-horizon object placement by transferring observed arrangements to novel instances within a canonical class frame. It introduces canonical class mappings, relative pose distributions in learned feature spaces, and an entropy-guided pose-encoding with model minimization to suppress distractors, enabling intra-category transfer. The approach yields strong simulated performance and demonstrates real-world table-setting with unseen objects, achieving 73.3% of human baseline in human evaluations. The authors also provide a perception-driven real-robot pipeline and release code and datasets publicly, underscoring practical viability for autonomous arrangement tasks.

Abstract

Efficient learning from demonstration for long-horizon tasks remains an open challenge in robotics. While significant effort has been directed toward learning trajectories, a recent resurgence of object-centric approaches has demonstrated improved sample efficiency, enabling transferable robotic skills. Such approaches model tasks as a sequence of object poses over time. In this work, we propose a scheme for transferring observed object arrangements to novel object instances by learning these arrangements on canonical class frames. We then employ this scheme to enable a simple yet effective approach for training models from as few as five demonstrations to predict arrangements of a wide range of objects including tableware, cutlery, furniture, and desk spaces. We propose a method for optimizing the learned models to enable efficient learning of tasks such as setting a table or tidying up an office with intra-category transfer, even in the presence of distractors. We present extensive experimental results in simulation and on a real robotic system for table setting which, based on human evaluations, scored 73.3% compared to a human baseline. We make the code and trained models publicly available at https://oplict.cs.uni-freiburg.de.

Paper Structure

This paper contains 15 sections, 11 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Our approach learns object placements sample-efficiently by mapping object instances to a known canonical instance and inferring the placement of the new object in this canonical space. Here, the known setup on the left is matched with the novel one on the right to place the chair.
  • Figure 2: Full pipeline of our system with real robot experiment. Our proposed pose inference method predicts the ideal object placement pose, which the robot then arranges autonomously. Our approach's few-shot transfer to other object instances is enabled by our object class mappings which are enabled by several large networks for object detection and feature extraction.
  • Figure 3: Illustration of the observation augmentation procedure used for model pruning. The observation of the cup relative to the sofa in Scene 1 is transferred to Scene 2. While the transferred observation is scored the same from the point of the sofa, from the point of the table, it is scored far lower, informing us that including the table improves our model.
  • Figure 4: We evaluate five different training scenarios with 16 variations for training (only 5 shown here) and 5 for evaluation each. The scale of objects changes non-uniformly and the placements are varied. They are hand-crafted to ensure that they are semantically meaningful. For each object category, we generate 12 key points, which remain the same across all instances. From the top left: Dinner places, Bread-cutting, Desks, Living room, and TV setup.
  • Figure 5: Results of our baseline comparison. For each method, we report the best-performing configuration. Our approach produces the lowest inference errors, followed by an MLP without object scaling information, Intention Likelihood, and Scene Score with 16 training samples $\text{SS}^{16}$. The VLM performs slightly lower than $\text{SS}^{16}$ and seems distracted by rotation inference.
  • ...and 3 more figures