Table of Contents
Fetching ...

REPeat: A Real2Sim2Real Approach for Pre-acquisition of Soft Food Items in Robot-assisted Feeding

Nayoung Ha, Ruolin Ye, Ziang Liu, Shubhangi Sinha, Tapomayukh Bhattacharjee

Abstract

The paper presents REPeat, a Real2Sim2Real framework designed to enhance bite acquisition in robot-assisted feeding for soft foods. It uses `pre-acquisition actions' such as pushing, cutting, and flipping to improve the success rate of bite acquisition actions such as skewering, scooping, and twirling. If the data-driven model predicts low success for direct bite acquisition, the system initiates a Real2Sim phase, reconstructing the food's geometry in a simulation. The robot explores various pre-acquisition actions in the simulation, then a Sim2Real step renders a photorealistic image to reassess success rates. If the success improves, the robot applies the action in reality. We evaluate the system on 15 diverse plates with 10 types of food items for a soft food diet, showing improvement in bite acquisition success rates by 27\% on average across all plates. See our project website at https://emprise.cs.cornell.edu/repeat.

REPeat: A Real2Sim2Real Approach for Pre-acquisition of Soft Food Items in Robot-assisted Feeding

Abstract

The paper presents REPeat, a Real2Sim2Real framework designed to enhance bite acquisition in robot-assisted feeding for soft foods. It uses `pre-acquisition actions' such as pushing, cutting, and flipping to improve the success rate of bite acquisition actions such as skewering, scooping, and twirling. If the data-driven model predicts low success for direct bite acquisition, the system initiates a Real2Sim phase, reconstructing the food's geometry in a simulation. The robot explores various pre-acquisition actions in the simulation, then a Sim2Real step renders a photorealistic image to reassess success rates. If the success improves, the robot applies the action in reality. We evaluate the system on 15 diverse plates with 10 types of food items for a soft food diet, showing improvement in bite acquisition success rates by 27\% on average across all plates. See our project website at https://emprise.cs.cornell.edu/repeat.

Paper Structure

This paper contains 20 sections, 7 figures.

Figures (7)

  • Figure 1: We propose REPeat, a Real2Sim2Real system for pre-acquisition of soft food items. The system evaluates the likelihood of successful bite acquisition; if low, it replicates the setup in simulation to explore various pre-acquisition actions. If a certain pre-acquisition action improves the bite acquisition success rate, the robot executes the pre-acquisition and bite acquisition actions in the real world.
  • Figure 2: Overview of REPeat: The process begins with SPANet-soft (Sec. \ref{['sec:spanetsoft']}) giving an initial estimation of the success rate of bite acquisition. The robot performs direct bite acquisition if the initial estimation of the success rate is higher than a threshold. Otherwise, it enters the Real2Sim2Real loop that consists of: (1) Real2Sim (Sec. \ref{['sec:real2sim']}): Reconstructing the 3D mesh in real-time with estimated depth as inputs, (2) Simulation (Sec. \ref{['sec:simulation']}): Rolling out various pre-acquisition actions using high-fidelity MPM simulation, (3) Sim2Real (Sec. \ref{['sec:sim2real']}): Rendering a visually realistic picture based on the simulation result. SPANet-soft evaluates the result to compare with the success rate of directly picking up food items without pre-acquisition. If the pre-acquisition action improves the bite acquisition success rate, the robot performs the pre-acquisition action first, followed by the bite acquisition action.
  • Figure 3: (a) Deformation of the template quad mesh for food mesh reconstruction: Using the RGB image from the camera, we perform instance segmentation, and apply the segmentation mask to the depth map to obtain per-instance depth images. We then use the values of these depth maps as displacement map to deform a template quad mesh. (b) The structure of SPANet-soft.
  • Figure 4: Setup: Our setup features a robot holding a feeding utensil, with a camera for perception and an F/T sensor to detect the end of the pushing action. It is adaptable to various robot embodiments and camera placements (frame or wrist-mounted). The figure shows 3 setups: 1. Franka robot with a camera mounted on a frame 2. Kinova 6-DoF robot with a camera mounted on the wrist 3. Kinova 7-DoF robot with a camera mounted on the wrist. The utensil has 2 DoFs: (a) Pitch, performing a scoop-like motion (b) Roll, performing a twirl-like motion.
  • Figure 5: Upper 5 axes corresponding to the characteristics of different food items and 10 food types selected to represent the extremes. Lower We evaluate the REPeat system on the following 15 plates containing 10 types of food items. J: Jell-O, MP: Mashed Potato, R: Rice, O: Oatmeal, B: Banana, S: Spaghetti, RV: Red velvet cake, A: Avocado, MC: Mac and cheese, T: Tofu.
  • ...and 2 more figures