Table of Contents
Fetching ...

Predicate Hierarchies Improve Few-Shot State Classification

Emily Jin, Joy Hsu, Jiajun Wu

TL;DR

PHIER tackles few-shot state classification by encoding predicate hierarchies into a joint image-predicate latent space. It combines an object-centric scene encoder, self-supervised predicate-relational losses guided by an LLM, and a hyperbolic latent space (Poincaré ball) to capture hierarchical structure, enabling strong generalization to unseen predicates and data shifts. Empirical results on CALVIN, BEHAVIOR, and real-world transfer show PHIER outperforms supervised and large pretrained VLM baselines in out-of-distribution and zero-/few-shot settings, while maintaining competitive in-distribution performance. This approach demonstrates that leveraging predicate hierarchies and hyperbolic geometry can substantially improve robust, data-efficient state reasoning for robotic planning and manipulation.

Abstract

State classification of objects and their relations is core to many long-horizon tasks, particularly in robot planning and manipulation. However, the combinatorial explosion of possible object-predicate combinations, coupled with the need to adapt to novel real-world environments, makes it a desideratum for state classification models to generalize to novel queries with few examples. To this end, we propose PHIER, which leverages predicate hierarchies to generalize effectively in few-shot scenarios. PHIER uses an object-centric scene encoder, self-supervised losses that infer semantic relations between predicates, and a hyperbolic distance metric that captures hierarchical structure; it learns a structured latent space of image-predicate pairs that guides reasoning over state classification queries. We evaluate PHIER in the CALVIN and BEHAVIOR robotic environments and show that PHIER significantly outperforms existing methods in few-shot, out-of-distribution state classification, and demonstrates strong zero- and few-shot generalization from simulated to real-world tasks. Our results demonstrate that leveraging predicate hierarchies improves performance on state classification tasks with limited data.

Predicate Hierarchies Improve Few-Shot State Classification

TL;DR

PHIER tackles few-shot state classification by encoding predicate hierarchies into a joint image-predicate latent space. It combines an object-centric scene encoder, self-supervised predicate-relational losses guided by an LLM, and a hyperbolic latent space (Poincaré ball) to capture hierarchical structure, enabling strong generalization to unseen predicates and data shifts. Empirical results on CALVIN, BEHAVIOR, and real-world transfer show PHIER outperforms supervised and large pretrained VLM baselines in out-of-distribution and zero-/few-shot settings, while maintaining competitive in-distribution performance. This approach demonstrates that leveraging predicate hierarchies and hyperbolic geometry can substantially improve robust, data-efficient state reasoning for robotic planning and manipulation.

Abstract

State classification of objects and their relations is core to many long-horizon tasks, particularly in robot planning and manipulation. However, the combinatorial explosion of possible object-predicate combinations, coupled with the need to adapt to novel real-world environments, makes it a desideratum for state classification models to generalize to novel queries with few examples. To this end, we propose PHIER, which leverages predicate hierarchies to generalize effectively in few-shot scenarios. PHIER uses an object-centric scene encoder, self-supervised losses that infer semantic relations between predicates, and a hyperbolic distance metric that captures hierarchical structure; it learns a structured latent space of image-predicate pairs that guides reasoning over state classification queries. We evaluate PHIER in the CALVIN and BEHAVIOR robotic environments and show that PHIER significantly outperforms existing methods in few-shot, out-of-distribution state classification, and demonstrates strong zero- and few-shot generalization from simulated to real-world tasks. Our results demonstrate that leveraging predicate hierarchies improves performance on state classification tasks with limited data.

Paper Structure

This paper contains 32 sections, 12 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: PHIER improves few-shot state classification, by encoding a predicate hierarchy in joint image-predicate latent space. By encouraging such structured representations to emerge, PHIER enables strong few-shot generalization to novel predicates with few examples.
  • Figure 2: PHIER consists of three main components. The first are disentangled image and predicate encoders, which separately extract features based on the objects and predicates in the state classification query. The second is a self-supervised learning process that injects explicit knowledge of pairwise predicate relations into the image-predicate latent space. The third is the use of a hyperbolic distance metric and encoder to encourage encoding of the inferred predicate hierarchy. Together, these components enable few-shot generalization to unseen object-predicate pairs and novel predicates.
  • Figure 3: Examples of state classification tasks from CALVIN and BEHAVIOR. The datasets span a range of visual realism and complexity.
  • Figure 4: Examples from our manually collected real-world dataset.
  • Figure 5: Ablations varying number of examples given in few-shot setting for CALVIN and BEHAVIOR environments.
  • ...and 7 more figures