Table of Contents
Fetching ...

Discovering Conceptual Knowledge with Analytic Ontology Templates for Articulated Objects

Jianhua Sun, Yuxuan Li, Longfei Xu, Jiude Wei, Liang Chai, Cewu Lu

TL;DR

The paper tackles how machines can understand and interact with articulated objects by moving beyond object-level learning to conceptual reasoning. It introduces Analytic Ontology Templates (AOT), parameterized, differentiable templates that capture geometric, kinematic, and affordance concepts, and a renderer to generate data. An AOTNet baseline demonstrates how to discover these concepts from raw observations and ground interaction strategies without relying on real training data. Experiments on PartNet-Mobility in SAPIEN show substantial gains over RL baselines and highlight the approach's interpretability and data efficiency in novel categories.

Abstract

Human cognition can leverage fundamental conceptual knowledge, like geometric and kinematic ones, to appropriately perceive, comprehend and interact with novel objects. Motivated by this finding, we aim to endow machine intelligence with an analogous capability through performing at the conceptual level, in order to understand and then interact with articulated objects, especially for those in novel categories, which is challenging due to the intricate geometric structures and diverse joint types of articulated objects. To achieve this goal, we propose Analytic Ontology Template (AOT), a parameterized and differentiable program description of generalized conceptual ontologies. A baseline approach called AOTNet driven by AOTs is designed accordingly to equip intelligent agents with these generalized concepts, and then empower the agents to effectively discover the conceptual knowledge on the structure and affordance of articulated objects. The AOT-driven approach yields benefits in three key perspectives: i) enabling concept-level understanding of articulated objects without relying on any real training data, ii) providing analytic structure information, and iii) introducing rich affordance information indicating proper ways of interaction. We conduct exhaustive experiments and the results demonstrate the superiority of our approach in understanding and then interacting with articulated objects.

Discovering Conceptual Knowledge with Analytic Ontology Templates for Articulated Objects

TL;DR

The paper tackles how machines can understand and interact with articulated objects by moving beyond object-level learning to conceptual reasoning. It introduces Analytic Ontology Templates (AOT), parameterized, differentiable templates that capture geometric, kinematic, and affordance concepts, and a renderer to generate data. An AOTNet baseline demonstrates how to discover these concepts from raw observations and ground interaction strategies without relying on real training data. Experiments on PartNet-Mobility in SAPIEN show substantial gains over RL baselines and highlight the approach's interpretability and data efficiency in novel categories.

Abstract

Human cognition can leverage fundamental conceptual knowledge, like geometric and kinematic ones, to appropriately perceive, comprehend and interact with novel objects. Motivated by this finding, we aim to endow machine intelligence with an analogous capability through performing at the conceptual level, in order to understand and then interact with articulated objects, especially for those in novel categories, which is challenging due to the intricate geometric structures and diverse joint types of articulated objects. To achieve this goal, we propose Analytic Ontology Template (AOT), a parameterized and differentiable program description of generalized conceptual ontologies. A baseline approach called AOTNet driven by AOTs is designed accordingly to equip intelligent agents with these generalized concepts, and then empower the agents to effectively discover the conceptual knowledge on the structure and affordance of articulated objects. The AOT-driven approach yields benefits in three key perspectives: i) enabling concept-level understanding of articulated objects without relying on any real training data, ii) providing analytic structure information, and iii) introducing rich affordance information indicating proper ways of interaction. We conduct exhaustive experiments and the results demonstrate the superiority of our approach in understanding and then interacting with articulated objects.
Paper Structure (12 sections, 4 figures, 2 tables)

This paper contains 12 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: A brief schematic of AOT with cuboid as an example. Its structure delineates the cuboid shape, its parameters determine the size and aspect ratios, its affordances can include possible grasp poses and its renderer draws specific cuboid instances in 3D space. Here the poses $\mathbf{M}_\mathbf{g}$ and $\mathbf{M}_\mathbf{g}^*$ are in the form of affine transformation matrices.
  • Figure 2: A brief illustration of the AOTNet workflow by articulated concept discovery (a-c). The top row refers to the processing pipeline in AOTNet, and the bottom row refers to the state of the real articulated object in corresponding steps.
  • Figure 3: Specific examples of basic geometric (a,b,d), kinematic (c) and composite (e) ontologies, where (a-d) are defined from scratch and (e) is built upon existing ones. Each block in (a-e) is a collection of the AOT name and rendering (left), parameters (top-right), structure (mid-right) and affordances (bottom-right, partial). (f) includes definitions of viewing angle and initial gripper pose $\mathbf{M}_\mathbf{g}^*$ (left), rotation matrix R and translation matrix T for poses (right). Note that i) the AOT name is just a referent (like a template id) of a concept and bears no semantic meanings; ii) the definitions of basic and composite AOTs are shown in the form of mathematical expressions and pseudo codes for easy understanding, please refer to the supplementary material for examples in Python scripts about how to define basic ones from scratch and composite ones by inheritance in an object-oriented programming fashion; iii) common parameters such as position and rotation in world coordinates are not shown for a clear view, please see Sec. \ref{['subsec:aot discussion']} for details.
  • Figure 4: Visualization of ontology discovery results for real-world objects in a simulation environment (a-d) and the physical world (e-h). The first row shows the target objects. The second row gives the input point clouds of the objects. The third row shows the discovery results of the actionable part with AOTNet.