FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes

Rajat Kumar Jenamani; Priya Sundaresan; Maram Sakr; Tapomayukh Bhattacharjee; Dorsa Sadigh

FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes

Rajat Kumar Jenamani, Priya Sundaresan, Maram Sakr, Tapomayukh Bhattacharjee, Dorsa Sadigh

TL;DR

FLAIR addresses the challenge of feeding realistic, in-the-wild meals by uniting vision-language item detection, a library of parameterized bite-acquisition skills, and a foundation-model-based long-horizon planner. It uses GPT-4V to sequence bites under user preferences and efficiency estimates, while a modular skill library executes actions such as skewering, twirling, scooping, and dipping, supplemented by pre-acquisition maneuvers like grouping, pushing, and cutting. The system demonstrates cross-institution and multi-robot robustness, beats strong baselines on noodle and mixed-dish plates, and successfully feeds a care recipient with mobility limitations, illustrating practical impact for autonomy and caregiver relief. Together, these contributions advance autonomous, personalized mealtime assistance for diverse, real-world meals and offer a versatile framework adaptable to future perception and manipulation improvements.

Abstract

Robot-assisted feeding has the potential to improve the quality of life for individuals with mobility limitations who are unable to feed themselves independently. However, there exists a large gap between the homogeneous, curated plates existing feeding systems can handle, and truly in-the-wild meals. Feeding realistic plates is immensely challenging due to the sheer range of food items that a robot may encounter, each requiring specialized manipulation strategies which must be sequenced over a long horizon to feed an entire meal. An assistive feeding system should not only be able to sequence different strategies efficiently in order to feed an entire meal, but also be mindful of user preferences given the personalized nature of the task. We address this with FLAIR, a system for long-horizon feeding which leverages the commonsense and few-shot reasoning capabilities of foundation models, along with a library of parameterized skills, to plan and execute user-preferred and efficient bite sequences. In real-world evaluations across 6 realistic plates, we find that FLAIR can effectively tap into a varied library of skills for efficient food pickup, while adhering to the diverse preferences of 42 participants without mobility limitations as evaluated in a user study. We demonstrate the seamless integration of FLAIR with existing bite transfer methods [19, 28], and deploy it across 2 institutions and 3 robots, illustrating its adaptability. Finally, we illustrate the real-world efficacy of our system by successfully feeding a care recipient with severe mobility limitations. Supplementary materials and videos can be found at: https://emprise.cs.cornell.edu/flair .

FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes

TL;DR

Abstract

Paper Structure (25 sections, 12 figures, 1 table)

This paper contains 25 sections, 12 figures, 1 table.

Introduction
Related Work
FLAIR: Feeding via Long-horizon Acquisition of Realistic dishes
Hardware System
Long-Horizon Bite Acquisition Framework
Acquisition skills
Pre-acquisition skills
Bite Sequencing via Foundation Models
Integration of Acquisition and Transfer
Experiments
Bite Acquisition Experiments
Comparisons with Task Planning Baselines
Demonstration of Real-World Feeding
Discussion
ACKNOWLEDGEMENT
...and 10 more sections

Figures (12)

Figure 1: We propose FLAIR, a system for long-horizon robot-assisted feeding that combines the commonsense and few-shot reasoning capabilities of foundation models with a library of parameterized skills. Above, FLAIR takes visual observations and a given user preference ("Please don't feed me any meatballs") to plan a sequence of actions that pushes aside meatballs and twirls spaghetti.
Figure 2: We implement our skill library using a custom feeding utensil (adapted from shaikewitz2022mouth) having two degrees of freedom for easy twirling and scooping at the end effector. We deploy the full feeding stack on three robots and two institutions: the 7-DoF Franka Emika Panda (top) and 7-DoF Kinova Gen 3 (middle) at Stanford University, and the 6-DoF Kinova Gen 3 (bottom) at Cornell University.
Figure 3: Our skill library consists of 7 parameterized manipulation skills: 4 acquisition (skewer, twirl, scoop, dip) and 3 pre-acquisition (group, push, cut).
Figure 4: Plates: We evaluate our system on the following six plates containing a variety of food items, each necessitating highly different manipulation skills.
Figure 5: Example run on a plate with mashed potatoes and sausages where the user specified no preference. FLAIR, which balances user preferences (bite variety) and efficiency, is judged by users to better adhere to preferences than Efficiency-Only and outperforms Preference-Only (Commonsense-Only) in plate clearance. Consequently, FLAIR is considered to provide a more human-like feeding experience compared to the baseline methods. Note that $^*$ indicates statistical significance ($p$-value $< 0.05$), determined via a Mann-Whitney U test.
...and 7 more figures

FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes

TL;DR

Abstract

FLAIR: Feeding via Long-horizon AcquIsition of Realistic dishes

Authors

TL;DR

Abstract

Table of Contents

Figures (12)