InfoCon: Concept Discovery with Generative and Discriminative Informativeness

Ruizhe Liu; Qian Luo; Yanchao Yang

InfoCon: Concept Discovery with Generative and Discriminative Informativeness

Ruizhe Liu, Qian Luo, Yanchao Yang

TL;DR

InfoCon tackles self-supervised discovery of manipulation concepts that ground to physical states for learning generalizable robot policies. It jointly optimizes generative informativeness $\mathcal{I}(\boldsymbol{\alpha}; \boldsymbol{s}^{\mathrm{key}}|\boldsymbol{s})$ and discriminative informativeness via a compatibility function $\mathcal{C}^{\alpha}(\boldsymbol{s})$, with the gradient $\nabla_{\boldsymbol{s}}\mathcal{C}^{\alpha}$ guiding the next action. A VQ-VAE–style codebook grounds sub-trajectories to discrete concepts through a transformer-based encoder that yields key states as sub-goals. On ManiSkill2 tasks, policies guided by discovered key states achieve competitive or superior performance relative to baselines and approach oracle with human-labeled key states, while reducing labeling effort. The work demonstrates the feasibility of grounding abstract manipulation concepts in embodied experience and points to future work on structuring relationships among discovered concepts.

Abstract

We focus on the self-supervised discovery of manipulation concepts that can be adapted and reassembled to address various robotic tasks. We propose that the decision to conceptualize a physical procedure should not depend on how we name it (semantics) but rather on the significance of the informativeness in its representation regarding the low-level physical state and state changes. We model manipulation concepts (discrete symbols) as generative and discriminative goals and derive metrics that can autonomously link them to meaningful sub-trajectories from noisy, unlabeled demonstrations. Specifically, we employ a trainable codebook containing encodings (concepts) capable of synthesizing the end-state of a sub-trajectory given the current state (generative informativeness). Moreover, the encoding corresponding to a particular sub-trajectory should differentiate the state within and outside it and confidently predict the subsequent action based on the gradient of its discriminative score (discriminative informativeness). These metrics, which do not rely on human annotation, can be seamlessly integrated into a VQ-VAE framework, enabling the partitioning of demonstrations into semantically consistent sub-trajectories, fulfilling the purpose of discovering manipulation concepts and the corresponding sub-goal (key) states. We evaluate the effectiveness of the learned concepts by training policies that utilize them as guidance, demonstrating superior performance compared to other baselines. Additionally, our discovered manipulation concepts compare favorably to human-annotated ones while saving much manual effort.

InfoCon: Concept Discovery with Generative and Discriminative Informativeness

TL;DR

InfoCon tackles self-supervised discovery of manipulation concepts that ground to physical states for learning generalizable robot policies. It jointly optimizes generative informativeness

and discriminative informativeness via a compatibility function

, with the gradient

guiding the next action. A VQ-VAE–style codebook grounds sub-trajectories to discrete concepts through a transformer-based encoder that yields key states as sub-goals. On ManiSkill2 tasks, policies guided by discovered key states achieve competitive or superior performance relative to baselines and approach oracle with human-labeled key states, while reducing labeling effort. The work demonstrates the feasibility of grounding abstract manipulation concepts in embodied experience and points to future work on structuring relationships among discovered concepts.

Abstract

Paper Structure (33 sections, 19 equations, 15 figures, 9 tables, 1 algorithm)

This paper contains 33 sections, 19 equations, 15 figures, 9 tables, 1 algorithm.

Introduction
Method
Problem Setup
Manipulation Concept as Generative and Discriminative Goals
Self-Supervised Manipulation Concept Discovery
Network Structure of $\Phi$
Trajectory Partitioning with Manipulation Concepts
Training Objectives
Experiments
Experimental Settings
Main Results
Ablation Study
Related Work
Conclusion
Acknowledgment
...and 18 more sections

Figures (15)

Figure 1: The proposed generative and discriminative informativeness and the derived InfoCon algorithm can discover manipulation concepts from noisy, unlabeled demonstrations. Each identified concept relates to a sub-goal and defines the partitioning of a whole trajectory into sub-trajectories, showing the process governed by the concept for achieving the sub-goal. Concepts from InfoCon share similarities with human-annotated ones while having more fine-grained semantics, which can be more beneficial for physical interactions but are time-consuming to label manually.
Figure 2: We characterize a manipulation concept from two conjugate perspectives. As a generative goal, a manipulation concept helps synthesize the state when the physical process meant by the concept is accomplished. On the other, as a discriminative goal, a manipulation concept indicates (through a scoring function) whether a state lies within the process governed by it. Moreover, it informs the next action with the gradient of the scoring function, as the action taken should maximize the discriminative utility.
Figure 3: Training pipeline of InfoCon. Features extracted from the state-action trajectory (using $\phi$) are compared with learnable concepts, which are grounded according to Eq. \ref{['eq:concept_assign']}. The generative informative loss (Eq. \ref{['eq:l_gen']}) trains $\theta^{\mathrm{g}}$ to predict the key (end) state of a sub-trajectory. The discriminative informative loss trains a compatibility function $\mathcal{C}$ conditioned on the concept (Eq. \ref{['eq:l_dis_c']}), which tells whether a state is compatible with the concept. Moreover, the actionable informativeness loss trains a policy $\pi$ for action prediction (Eq. \ref{['eq:l_dis']}). Together, these components enforce the grounding to be physically and semantically meaningful.
Figure 4: Examples of the manually defined key states (concepts) in different manipulation tasks. From left to right: P&P Cube and its two key states ("Grasp", "End"). Stack Cube and its three key states ("Grasp $A$", "$A$ on $B$", "End"). Turn Faucet and its two key states ("Contacted", "End"). Peg Insertion and its three key states ("Grasp", "Align", "End").
Figure 5: Key states discovered and grounded by InfoCon. From top-left to bottom-right are visualizations of key states of tasks: P&P Cube, Stack Cube, Turn Faucet, and Peg Insertion. Each subfigure contains frames of ground-truth key states at the upper part and key states discovered by InfoCon below. We align frames of ground-truth key states with the nearest subsequent key states from InfoCon by checking their timesteps.
...and 10 more figures

InfoCon: Concept Discovery with Generative and Discriminative Informativeness

TL;DR

Abstract

InfoCon: Concept Discovery with Generative and Discriminative Informativeness

Authors

TL;DR

Abstract

Table of Contents

Figures (15)