Table of Contents
Fetching ...

GIFT: Generalizing Intent for Flexible Test-Time Rewards

Fin Amin, Nathaniel Dennler, Andreea Bobu

Abstract

Robots learn reward functions from user demonstrations, but these rewards often fail to generalize to new environments. This failure occurs because learned rewards latch onto spurious correlations in training data rather than the underlying human intent that demonstrations represent. Existing methods leverage visual or semantic similarity to improve robustness, yet these surface-level cues often diverge from what humans actually care about. We present Generalizing Intent for Flexible Test-Time Rewards (GIFT), a framework that grounds reward generalization in human intent rather than surface cues. GIFT leverages language models to infer high-level intent from user demonstrations by contrasting preferred with non-preferred behaviors. At deployment, GIFT maps novel test states to behaviorally equivalent training states via intent-conditioned similarity, enabling learned rewards to generalize across distribution shifts without retraining. We evaluate GIFT on tabletop manipulation tasks with new objects and layouts. Across four simulated tasks with over 50 unseen objects, GIFT consistently outperforms visual and semantic similarity baselines in test-time pairwise win rate and state-alignment F1 score. Real-world experiments on a 7-DoF Franka Panda robot demonstrate that GIFT reliably transfers to physical settings. Further discussion can be found at https://mit-clear-lab.github.io/GIFT/

GIFT: Generalizing Intent for Flexible Test-Time Rewards

Abstract

Robots learn reward functions from user demonstrations, but these rewards often fail to generalize to new environments. This failure occurs because learned rewards latch onto spurious correlations in training data rather than the underlying human intent that demonstrations represent. Existing methods leverage visual or semantic similarity to improve robustness, yet these surface-level cues often diverge from what humans actually care about. We present Generalizing Intent for Flexible Test-Time Rewards (GIFT), a framework that grounds reward generalization in human intent rather than surface cues. GIFT leverages language models to infer high-level intent from user demonstrations by contrasting preferred with non-preferred behaviors. At deployment, GIFT maps novel test states to behaviorally equivalent training states via intent-conditioned similarity, enabling learned rewards to generalize across distribution shifts without retraining. We evaluate GIFT on tabletop manipulation tasks with new objects and layouts. Across four simulated tasks with over 50 unseen objects, GIFT consistently outperforms visual and semantic similarity baselines in test-time pairwise win rate and state-alignment F1 score. Real-world experiments on a 7-DoF Franka Panda robot demonstrate that GIFT reliably transfers to physical settings. Further discussion can be found at https://mit-clear-lab.github.io/GIFT/
Paper Structure (13 sections, 10 equations, 9 figures)

This paper contains 13 sections, 10 equations, 9 figures.

Figures (9)

  • Figure 1: Top. During training, the robot gets task demonstrations (loading a paintbrush) and uses them to infer the user's high-level intent ("load art supplies"). Bottom. At test time, the robot encounters three unseen objects. GIFT uses the inferred intent to correctly identify that the molding clay is relevant. In contrast, visual-similarity baselines incorrectly prefer the dish scrubber due to its appearance, and language-similarity baselines make an analogous mistake ("toothbrush" and "paintbrush.")
  • Figure 2: Generalizing Intent for Flexible Test-Time rewards. Left. GIFT infers the human's intent given pairs of human-preferred demonstrations and reference trajectories. Right. During inference, GIFT deduces which objects in the unseen states should behave as objects in the training states. Afterwards, the unseen state components are aligned to training states so that the reward function learned before deployment can be used for planning.
  • Figure 3: Similarity Between LM-Inferred and Ground Truth Intent. We gave the LM 3 demonstration pairs from a virtual Jaco robot and a real-world Franka robot, and tasked it with deducing the human's intent. We found that LMs produced an acceptable conditioning variable for alignment.
  • Figure 4: Test-Time Pairwise Win Rate. Across our tasks, the rewards learned via GIFT achieved a higher win rate than all other baselines. These results were aggregated over 250 trajectory pairs per scene, with randomization of various state components along with random sampling from a pool of over 50 unseen objects. Black bars denote standard errors.
  • Figure 5: Test-Time State Alignment F1 Score. GIFT achieves a superior F1 score by aligning along intent, reducing confounds from superficial language/visual similarity.
  • ...and 4 more figures