Table of Contents
Fetching ...

Affordance Transfer Across Object Instances via Semantically Anchored Functional Map

Xiaoxiang Dong, Weiming Zhi

TL;DR

SemFM addresses the challenge of transferring demonstrated affordances across geometrically diverse objects by anchoring correspondences at semantically meaningful regions and propagating them with a functional map. It integrates semantic cues from pretrained models with a spectral surface representation, enabling dense, coherent transfer from a single demonstration while maintaining efficiency relative to multi-view VLM approaches. The approach is validated on synthetic categories and real robotic tasks, showing favorable accuracy and runtime trade-offs, with practical implications for perception-to-action loops in robotics. Overall, SemFM provides a controllable, interpretable framework that couples semantic region alignment with intrinsic surface structure to generalize manipulation affordances across object instances.

Abstract

Traditional learning from demonstration (LfD) generally demands a cumbersome collection of physical demonstrations, which can be time-consuming and challenging to scale. Recent advances show that robots can instead learn from human videos by extracting interaction cues without direct robot involvement. However, a fundamental challenge remains: how to generalize demonstrated interactions across different object instances that share similar functionality but vary significantly in geometry. In this work, we propose \emph{Semantic Anchored Functional Maps} (SemFM), a framework for transferring affordances across objects from a single visual demonstration. Starting from a coarse mesh reconstructed from an image, our method identifies semantically corresponding functional regions between objects, selects mutually exclusive semantic anchors, and propagates these constraints over the surface using a functional map to obtain a dense, semantically consistent correspondence. This enables demonstrated interaction regions to be transferred across geometrically diverse objects in a lightweight and interpretable manner. Experiments on synthetic object categories and real-world robotic manipulation tasks show that our approach enables accurate affordance transfer with modest computational cost, making it well-suited for practical robotic perception-to-action pipelines.

Affordance Transfer Across Object Instances via Semantically Anchored Functional Map

TL;DR

SemFM addresses the challenge of transferring demonstrated affordances across geometrically diverse objects by anchoring correspondences at semantically meaningful regions and propagating them with a functional map. It integrates semantic cues from pretrained models with a spectral surface representation, enabling dense, coherent transfer from a single demonstration while maintaining efficiency relative to multi-view VLM approaches. The approach is validated on synthetic categories and real robotic tasks, showing favorable accuracy and runtime trade-offs, with practical implications for perception-to-action loops in robotics. Overall, SemFM provides a controllable, interpretable framework that couples semantic region alignment with intrinsic surface structure to generalize manipulation affordances across object instances.

Abstract

Traditional learning from demonstration (LfD) generally demands a cumbersome collection of physical demonstrations, which can be time-consuming and challenging to scale. Recent advances show that robots can instead learn from human videos by extracting interaction cues without direct robot involvement. However, a fundamental challenge remains: how to generalize demonstrated interactions across different object instances that share similar functionality but vary significantly in geometry. In this work, we propose \emph{Semantic Anchored Functional Maps} (SemFM), a framework for transferring affordances across objects from a single visual demonstration. Starting from a coarse mesh reconstructed from an image, our method identifies semantically corresponding functional regions between objects, selects mutually exclusive semantic anchors, and propagates these constraints over the surface using a functional map to obtain a dense, semantically consistent correspondence. This enables demonstrated interaction regions to be transferred across geometrically diverse objects in a lightweight and interpretable manner. Experiments on synthetic object categories and real-world robotic manipulation tasks show that our approach enables accurate affordance transfer with modest computational cost, making it well-suited for practical robotic perception-to-action pipelines.
Paper Structure (26 sections, 19 equations, 10 figures, 1 table)

This paper contains 26 sections, 19 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Overview of the proposed Semantic Anchored Functional Map pipeline. Starting from a single RGB observation of a demonstrated interaction, we reconstruct a coarse object mesh and extract the demonstrated affordance region. Semantic features are used to identify corresponding anchor regions across objects, which constrain a functional map to produce a smooth dense correspondence. The transferred affordance region on the target object is then used to generate a feasible grasp, enabling execution on a real robot.
  • Figure 2: Illustration of functional maps. Given two related meshes (cat and tiger), functional maps recover a smooth correspondence that aligns semantically similar regions while maintaining spatial coherence over the surface.
  • Figure 3: Overview of the proposed Semantic Anchored Functional Map pipeline. Given a single RGB observation of a demonstrated hand-object interaction and another object, we first construct coarse object meshes and extract the demonstrated affordance region. Semantic features are extracted using pretrained embeddings, lifted into 3D, and then used to identify corresponding anchor regions across objects, which constrain a functional map to produce smooth, dense correspondences. This then enables the affordance region to be transferred.
  • Figure 4: Pipeline acquiring coarse 3D mesh and affordance from single-view image. The mesh is inferred from an RGB image and object mask using SAM3D. The hand mask is lifted to 3D and intersected with the mesh to obtain the affordance region.
  • Figure 5: Example of semantic anchor selection. Each mesh is clustered into five regions. The top two mutually exclusive cluster pairs are selected as semantic anchors based on cross-object similarity: the first anchor pairs cluster 2 of object 1 with cluster 1 of object 2 (similarity 0.89), and the second pairs cluster 3 of object 1 with cluster 4 of object 2 (similarity 0.85).
  • ...and 5 more figures