Table of Contents
Fetching ...

GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks

Yen-Ling Tai, Yi-Ru Yang, Kuan-Ting Yu, Yu-Wei Chao, Yi-Ting Chen

TL;DR

GRITS addresses spillage in robotic food scooping by integrating a spillage predictor into a guided diffusion policy, enabling test-time trajectory refinement with a differentiable safety objective. The spillage predictor is trained in simulation over four primitive shapes with varied properties, and operates on segmented point clouds processed by a DP3 encoder to bridge sim-to-real gaps. Inference uses a gradient-based guidance mechanism with a defined objective $J$ based on $P_{\text{spillage}}$, and a carefully chosen guidance weight $\\rho$ and scheduling to balance safety and task success. Real-world experiments across six training foods and ten unseen categories achieve 82% task success and 4% spillage, reducing spillage by over 40% relative to baselines, demonstrating robust generalization and practical viability.

Abstract

Robotic food scooping is a critical manipulation skill for food preparation and service robots. However, existing robot learning algorithms, especially learn-from-demonstration methods, still struggle to handle diverse and dynamic food states, which often results in spillage and reduced reliability. In this work, we introduce GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks. This framework leverages guided diffusion policy to minimize food spillage during scooping and to ensure reliable transfer of food items from the initial to the target location. Specifically, we design a spillage predictor that estimates the probability of spillage given current observation and action rollout. The predictor is trained on a simulated dataset with food spillage scenarios, constructed from four primitive shapes (spheres, cubes, cones, and cylinders) with varied physical properties such as mass, friction, and particle size. At inference time, the predictor serves as a differentiable guidance signal, steering the diffusion sampling process toward safer trajectories while preserving task success. We validate GRITS on a real-world robotic food scooping platform. GRITS is trained on six food categories and evaluated on ten unseen categories with different shapes and quantities. GRITS achieves an 82% task success rate and a 4% spillage rate, reducing spillage by over 40% compared to baselines without guidance, thereby demonstrating its effectiveness.

GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks

TL;DR

GRITS addresses spillage in robotic food scooping by integrating a spillage predictor into a guided diffusion policy, enabling test-time trajectory refinement with a differentiable safety objective. The spillage predictor is trained in simulation over four primitive shapes with varied properties, and operates on segmented point clouds processed by a DP3 encoder to bridge sim-to-real gaps. Inference uses a gradient-based guidance mechanism with a defined objective based on , and a carefully chosen guidance weight and scheduling to balance safety and task success. Real-world experiments across six training foods and ten unseen categories achieve 82% task success and 4% spillage, reducing spillage by over 40% relative to baselines, demonstrating robust generalization and practical viability.

Abstract

Robotic food scooping is a critical manipulation skill for food preparation and service robots. However, existing robot learning algorithms, especially learn-from-demonstration methods, still struggle to handle diverse and dynamic food states, which often results in spillage and reduced reliability. In this work, we introduce GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks. This framework leverages guided diffusion policy to minimize food spillage during scooping and to ensure reliable transfer of food items from the initial to the target location. Specifically, we design a spillage predictor that estimates the probability of spillage given current observation and action rollout. The predictor is trained on a simulated dataset with food spillage scenarios, constructed from four primitive shapes (spheres, cubes, cones, and cylinders) with varied physical properties such as mass, friction, and particle size. At inference time, the predictor serves as a differentiable guidance signal, steering the diffusion sampling process toward safer trajectories while preserving task success. We validate GRITS on a real-world robotic food scooping platform. GRITS is trained on six food categories and evaluated on ten unseen categories with different shapes and quantities. GRITS achieves an 82% task success rate and a 4% spillage rate, reducing spillage by over 40% compared to baselines without guidance, thereby demonstrating its effectiveness.

Paper Structure

This paper contains 21 sections, 4 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: Spillage-aware trajectory generation with GRITS. Robotic food scooping demands exact and delicate control, as small deviations can result in spillage. GRITS addresses this challenge by leveraging predicted spillage probabilities to adaptively refine trajectories, leading to safer and more reliable manipulation. The adjustments are subtle, with an average displacement of only 0.3 cm between consecutive trajectory points.
  • Figure 2: The architecture of GRITS. GRITS is a guided diffusion policy designed for robotic food scooping. Given an RGB-D image and an initial noisy trajectory, the diffusion policy denoises it into a refined trajectory. A spillage predictor, which takes segmented point clouds as input to reduce the sim-to-real gap, estimates the probability of spillage for given candidate trajectory. This probability provides a guidance signal that steers the denoising process toward safer trajectories. The robot then follows the refined trajectory using position control to scoop food items.
  • Figure 3: Simulated Food Scooping Data Collection. We construct a scooping dataset in simulation to train the spillage predictor. Simulated foods are composed of four primitive shapes: spheres, cubes, cones, and cylinders, with varied physical properties, including mass, friction, and particle size. This design enables diverse and controllable scooping and spillage cases under different rollouts, which are impractical to collect in the real world.
  • Figure 4: Food categories for training and testing sets. The training set includes six food items (top row) varying in sphere size from small to large. The testing set (bottom row) covers ten additional food categories with diverse shapes and material properties. Numbers below each food item indicate quantities: small-particle foods are measured by weight (g), and large-particle foods are measured by count (pieces).
  • Figure 5: Real-World Experimental Platform. We set up a 35 × 30 cm workspace (indicated in the Red bounding box) for the experiments.
  • ...and 4 more figures