Table of Contents
Fetching ...

Integrating Deep RL and Bayesian Inference for ObjectNav in Mobile Robotics

João Castelo-Branco, José Santos-Victor, Alexandre Bernardino

Abstract

Autonomous object search is challenging for mobile robots operating in indoor environments due to partial observability, perceptual uncertainty, and the need to trade off exploration and navigation efficiency. Classical probabilistic approaches explicitly represent uncertainty but typically rely on handcrafted action-selection heuristics, while deep reinforcement learning enables adaptive policies but often suffers from slow convergence and limited interpretability. This paper proposes a hybrid object-search framework that integrates Bayesian inference with deep reinforcement learning. The method maintains a spatial belief map over target locations, updated online through Bayesian inference from calibrated object detections, and trains a reinforcement learning policy to select navigation actions directly from this probabilistic representation. The approach is evaluated in realistic indoor simulation using Habitat 3.0 and compared against developed baseline strategies. Across two indoor environments, the proposed method improves success rate while reducing search effort. Overall, the results support the value of combining Bayesian belief estimation with learned action selection to achieve more efficient and reliable objectsearch behavior under partial observability.

Integrating Deep RL and Bayesian Inference for ObjectNav in Mobile Robotics

Abstract

Autonomous object search is challenging for mobile robots operating in indoor environments due to partial observability, perceptual uncertainty, and the need to trade off exploration and navigation efficiency. Classical probabilistic approaches explicitly represent uncertainty but typically rely on handcrafted action-selection heuristics, while deep reinforcement learning enables adaptive policies but often suffers from slow convergence and limited interpretability. This paper proposes a hybrid object-search framework that integrates Bayesian inference with deep reinforcement learning. The method maintains a spatial belief map over target locations, updated online through Bayesian inference from calibrated object detections, and trains a reinforcement learning policy to select navigation actions directly from this probabilistic representation. The approach is evaluated in realistic indoor simulation using Habitat 3.0 and compared against developed baseline strategies. Across two indoor environments, the proposed method improves success rate while reducing search effort. Overall, the results support the value of combining Bayesian belief estimation with learned action selection to achieve more efficient and reliable objectsearch behavior under partial observability.

Paper Structure

This paper contains 14 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Example object-search episode in a domestic indoor environment. The robot is tasked with locating a laptop and declaring success when the detection confidence exceeds $75\%$. (top-left) Initial exploration in the living room, where the target is not detected. (top-right) The robot explores an adjacent room through the doorway; although the laptop is visible from an oracle perspective, it is not yet identified by the detector. (bottom-left) After entering the room, the robot detects the laptop but with insufficient confidence to terminate the task. (bottom-right) The robot actively moves closer to reduce perceptual uncertainty, achieving a confident detection and successfully completing the search.
  • Figure 2: Overview of the proposed hybrid object-search framework. The robot converts RGB-D observations into spatial evidence, updates a Bayesian belief map over target locations, and selects actions using a DRL policy conditioned on the belief representation. A clustering subsystem partitions free space into candidate regions to structure exploration and navigation.
  • Figure 3: Example of map-level evidence generation at a single time step. Left: RGB frame with detections; only the most likely class label per bounding box is shown, while the full categorical output vector $p$ is used to compute observation evidence. Right: detections projected onto the occupancy grid; each projected detection is represented by a colored circle matching the corresponding bounding box. Blue cells denote occupied map cells within the robot field of view (blue rays) that have no mapped detections at that time step, providing background evidence for the Bayesian belief update.
  • Figure 4: Clustering-based abstraction of the navigation space. Free cells are partitioned into spatial clusters (colored regions), each represented by a centroid (marked with $\times$) used as a candidate viewpoint for exploration.
  • Figure 5: Training and evaluation targets in Env. 1. The policy is trained on a potted plant, laptop, and teddy bear, and evaluated on a held-out tv.
  • ...and 1 more figures