Table of Contents
Fetching ...

LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots

Dongge Han, Trevor McInroe, Adam Jelley, Stefano V. Albrecht, Peter Bell, Amos Storkey

TL;DR

The paper addresses the gap in personalizing LLM-based planners for household robots to reflect individual user preferences. It introduces LLM-Personalize, which combines imitation learning to bootstrap planning and Iterative Reinforced Self-Training to refine the planner toward user-specific goals, operating over a scene-graph context and an iterative planning loop. The approach yields significant performance gains on the Housekeep benchmark (over 30% increase in success rate) and demonstrates improved alignment with human preferences and cross-domain transfer. This work advances personalized, long-horizon robotic planning and has practical implications for deploying user-tailored LLM-powered household agents in real-world settings.

Abstract

Large language models (LLMs) have shown significant potential for robotics applications, particularly task planning, by harnessing their language comprehension and text generation capabilities. However, in applications such as household robotics, a critical gap remains in the personalization of these models to individual user preferences. We introduce LLM-Personalize, a novel framework with an optimization pipeline designed to personalize LLM planners for household robotics. Our LLM-Personalize framework features an LLM planner that performs iterative planning in multi-room, partially-observable household scenarios, making use of a scene graph constructed with local observations. The generated plan consists of a sequence of high-level actions which are subsequently executed by a controller. Central to our approach is the optimization pipeline, which combines imitation learning and iterative self-training to personalize the LLM planner. In particular, the imitation learning phase performs initial LLM alignment from demonstrations, and bootstraps the model to facilitate effective iterative self-training, which further explores and aligns the model to user preferences. We evaluate LLM-Personalize on Housekeep, a challenging simulated real-world 3D benchmark for household rearrangements, and show that LLM-Personalize achieves more than a 30 percent increase in success rate over existing LLM planners, showcasing significantly improved alignment with human preferences. Project page: https://gdg94.github.io/projectllmpersonalize/.

LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots

TL;DR

The paper addresses the gap in personalizing LLM-based planners for household robots to reflect individual user preferences. It introduces LLM-Personalize, which combines imitation learning to bootstrap planning and Iterative Reinforced Self-Training to refine the planner toward user-specific goals, operating over a scene-graph context and an iterative planning loop. The approach yields significant performance gains on the Housekeep benchmark (over 30% increase in success rate) and demonstrates improved alignment with human preferences and cross-domain transfer. This work advances personalized, long-horizon robotic planning and has practical implications for deploying user-tailored LLM-powered household agents in real-world settings.

Abstract

Large language models (LLMs) have shown significant potential for robotics applications, particularly task planning, by harnessing their language comprehension and text generation capabilities. However, in applications such as household robotics, a critical gap remains in the personalization of these models to individual user preferences. We introduce LLM-Personalize, a novel framework with an optimization pipeline designed to personalize LLM planners for household robotics. Our LLM-Personalize framework features an LLM planner that performs iterative planning in multi-room, partially-observable household scenarios, making use of a scene graph constructed with local observations. The generated plan consists of a sequence of high-level actions which are subsequently executed by a controller. Central to our approach is the optimization pipeline, which combines imitation learning and iterative self-training to personalize the LLM planner. In particular, the imitation learning phase performs initial LLM alignment from demonstrations, and bootstraps the model to facilitate effective iterative self-training, which further explores and aligns the model to user preferences. We evaluate LLM-Personalize on Housekeep, a challenging simulated real-world 3D benchmark for household rearrangements, and show that LLM-Personalize achieves more than a 30 percent increase in success rate over existing LLM planners, showcasing significantly improved alignment with human preferences. Project page: https://gdg94.github.io/projectllmpersonalize/.
Paper Structure (21 sections, 2 equations, 6 figures, 2 tables)

This paper contains 21 sections, 2 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Illustration of LLM-Personalize. Agent architecture: The Context Generator constructs and updates a scene graph from local observations. The LLM Planner uses the graph to produce a plan as a sequence of high-level actions, and iteratively re-plans when the previous plan has been executed. Each high-level action is translated to a sequence of control actions and executed by the Controller. To personalize the LLM Planner, we introduce an optimization pipeline integrating imitation learning and iterative reinforced Self-Training to fine-tune and align the planner with user preferences.
  • Figure 2: The Context generator builds and updates the graph of the household state of rooms, receptacles and objects, derived from the robot's local observations at each timestep. The information is provided as a prompt to the LLM planner. Top-down view of the scene is for illustration only, the robot only has access to the 1st-person view.
  • Figure 3: Optimization pipeline of LLM-Personalize using imitation learning and iterative reinforced Self-Training.
  • Figure 4: The Housekeep scenes used in our experiment.
  • Figure 5: Demonstration of four planning iterations generated and executed by LLM-Personalize (top row) and the resulting graphs (bottom row) on a test task in Housekeep. Green/red object (leaf) nodes indicate correct/wrong placements. The object being moved is shown in boldface. This episode starts with 2 correctly placed objects and 5 misplaced objects (left), and changed to 6 correctly placed objects and only 1 misplaced objects after rearrangements (right). For clarity, the graphs only show receptacles with objects and omit all other receptacles.
  • ...and 1 more figures