Table of Contents
Fetching ...

User-Centric Object Navigation: A Benchmark with Integrated User Habits for Personalized Embodied Object Search

Hongcheng Wang, Jinyu Zhu, Hao Dong

TL;DR

UcON addresses the gap where object locations follow not only scene priors but also user-specific habits. The authors introduce a large-scale benchmark with 489 object categories and about 22,600 habits, plus a Habit Retrieval Module to fetch task-relevant habits and guide planning via LLMs. Experiments in simulation and real environments show that standard ON methods struggle under habit-driven placements, while incorporating user habits improves success rates, with HRM providing further gains. The work aims to push personalized embodied AI by teaching agents to reason over and utilize household-specific behavior for efficient object search, and provides code for reproducibility.

Abstract

In the evolving field of robotics, the challenge of Object Navigation (ON) in household environments has attracted significant interest. Existing ON benchmarks typically place objects in locations guided by general scene priors, without accounting for the specific placement habits of individual users. This omission limits the adaptability of navigation agents in personalized household environments. To address this, we introduce User-centric Object Navigation (UcON), a new benchmark that incorporates user-specific object placement habits, referred to as user habits. This benchmark requires agents to leverage these user habits for more informed decision-making during navigation. UcON encompasses approximately 22,600 user habits across 489 object categories. UcON is, to our knowledge, the first benchmark that explicitly formalizes and evaluates habit-conditioned object navigation at scale and covers the widest range of target object categories. Additionally, we propose a habit retrieval module to extract and utilize habits related to target objects, enabling agents to infer their likely locations more effectively. Experimental results demonstrate that current SOTA methods exhibit substantial performance degradation under habit-driven object placement, while integrating user habits consistently improves success rates. Code is available at https://github.com/whcpumpkin/User-Centric-Object-Navigation.

User-Centric Object Navigation: A Benchmark with Integrated User Habits for Personalized Embodied Object Search

TL;DR

UcON addresses the gap where object locations follow not only scene priors but also user-specific habits. The authors introduce a large-scale benchmark with 489 object categories and about 22,600 habits, plus a Habit Retrieval Module to fetch task-relevant habits and guide planning via LLMs. Experiments in simulation and real environments show that standard ON methods struggle under habit-driven placements, while incorporating user habits improves success rates, with HRM providing further gains. The work aims to push personalized embodied AI by teaching agents to reason over and utilize household-specific behavior for efficient object search, and provides code for reproducibility.

Abstract

In the evolving field of robotics, the challenge of Object Navigation (ON) in household environments has attracted significant interest. Existing ON benchmarks typically place objects in locations guided by general scene priors, without accounting for the specific placement habits of individual users. This omission limits the adaptability of navigation agents in personalized household environments. To address this, we introduce User-centric Object Navigation (UcON), a new benchmark that incorporates user-specific object placement habits, referred to as user habits. This benchmark requires agents to leverage these user habits for more informed decision-making during navigation. UcON encompasses approximately 22,600 user habits across 489 object categories. UcON is, to our knowledge, the first benchmark that explicitly formalizes and evaluates habit-conditioned object navigation at scale and covers the widest range of target object categories. Additionally, we propose a habit retrieval module to extract and utilize habits related to target objects, enabling agents to infer their likely locations more effectively. Experimental results demonstrate that current SOTA methods exhibit substantial performance degradation under habit-driven object placement, while integrating user habits consistently improves success rates. Code is available at https://github.com/whcpumpkin/User-Centric-Object-Navigation.
Paper Structure (22 sections, 1 equation, 4 figures, 3 tables)

This paper contains 22 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: A Habit-shaped Scene and An Example of Reasoning. In the bottom half of the figure we show five examples of objects being placed according to user habits (in practice the number will be higher). The top half is a presentation of the User Habit Knowledge Base and a comparison of LLM reasoning about whether or not to include habits.
  • Figure 2: Task and Method Process Overview. At the beginning, we sample $k$ objects and obtain their corresponding habits (each object may have multiply habits) to consist of the User Habit Knowledge Base, as well as all their possible positions (generated by GPT4, see Sec. \ref{['task_generation']}). Each object is then sampled at a random position, and this position is used to modify the placement of the object to form a Habit-shaped Scene. A Habit Retrieval Module then retrieves the relevant habits, while an Object Detector identifies objects in the current observation. Information from both modules is then formatted by a prompter and fed into the Foundation Large Model to generate a plan.
  • Figure 3: A Case Study in The Large Environment We show the waypoints and LLM reasoning results from one experiment. In this experiment, A book is initially invisible as a target object, but the agent is told about the user's habit of liking reading books before sleeping. The red squares represent the waypoints chosen by PixelNav. (A) The habit level is Retrieval. Although the books cannot be seen in the field of vision at first, based on the descriptions in the habit, the books are most likely placed in the bedroom, and then the direction in which the bed appears in the field of vision is most likely the bedroom. The book is also eventually found in the bedroom at the foot of the bed. (B) The habit level is None. Without user habits as a priori knowledge, LLM can only reason from common sense that books are more likely to be found in the living room compared to the kitchen. Eventually there is no book to be found after circling the living room a few times.
  • Figure 4: A Case Study in The Small Environment We show the waypoints and LLM reasoning results from one experiment. In this experiment, a newspaper, as a target object, is deliberately obscured by our laptops. The agent is told about the user's habit of enjoying breakfast on the console table while reading the newspaper. The red squares represent the waypoints chosen by PixelNav. (A) The habit level is Retrieval. The agent first identifies the room where breakfast is served, then determines that the newspaper is nearby based on the console table in the field of view, and eventually finds the target. (B) The habit level is None. The agent can only use common sense to determine that the newspaper might be in the living room or bedroom, and then search aimlessly through the living room. In the end, the agent fails to find the target object