LLM-Pack: Intuitive Grocery Handling for Logistics Applications
Yannik Blei, Michael Krawez, Tobias Jülg, Pierre Krack, Florian Walter, Wolfram Burgard
TL;DR
LLM-Pack presents an open vocabulary grocery packing framework that fuses vision language perception, planning with large language models, and GroundedSAM based execution to preserve product integrity. The method introduces the Packing Consistency Score $C$ to quantify human like packing sequences and provides a Grocery Packing Dataset for evaluation, demonstrating strong planning performance with GPT-4.5 and robust end to end results on a Franka robot. The key contributions include a zero training requirement for new grocery items, modular design for model upgrades, and publicly released dataset and code to enable reproducibility and future work. The results indicate practical potential for service robotics in grocery settings, with future directions including enhanced execution with VLAs and improved handling of free space within packing containers.
Abstract
Robotics and automation are increasingly influential in logistics but remain largely confined to traditional warehouses. In grocery retail, advancements such as cashier-less supermarkets exist, yet customers still manually pick and pack groceries. While there has been a substantial focus in robotics on the bin picking problem, the task of packing objects and groceries has remained largely untouched. However, packing grocery items in the right order is crucial for preventing product damage, e.g., heavy objects should not be placed on top of fragile ones. However, the exact criteria for the right packing order are hard to define, in particular given the huge variety of objects typically found in stores. In this paper, we introduce LLM-Pack, a novel approach for grocery packing. LLM-Pack leverages language and vision foundation models for identifying groceries and generating a packing sequence that mimics human packing strategy. LLM-Pack does not require dedicated training to handle new grocery items and its modularity allows easy upgrades of the underlying foundation models. We extensively evaluate our approach to demonstrate its performance. We will make the source code of LLMPack publicly available upon the publication of this manuscript.
