Table of Contents
Fetching ...

Unsupervised Active Visual Search with Monte Carlo planning under Uncertain Detections

Francesco Taioli, Francesco Giuliari, Yiming Wang, Riccardo Berra, Alberto Castellini, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti

TL;DR

The solution uses the current pose of an agent and an RGB-D observation to learn an optimal search policy, exploiting a POMDP solved by a Monte-Carlo planning approach and incorporates the awareness that an object detector may fail into the aforementioned probability modelling by exploiting the success statistics of a specific detector.

Abstract

We propose a solution for Active Visual Search of objects in an environment, whose 2D floor map is the only known information. Our solution has three key features that make it more plausible and robust to detector failures compared to state-of-the-art methods: (i) it is unsupervised as it does not need any training sessions. (ii) During the exploration, a probability distribution on the 2D floor map is updated according to an intuitive mechanism, while an improved belief update increases the effectiveness of the agent's exploration. (iii) We incorporate the awareness that an object detector may fail into the aforementioned probability modelling by exploiting the success statistics of a specific detector. Our solution is dubbed POMP-BE-PD (Pomcp-based Online Motion Planning with Belief by Exploration and Probabilistic Detection). It uses the current pose of an agent and an RGB-D observation to learn an optimal search policy, exploiting a POMDP solved by a Monte-Carlo planning approach. On the Active Vision Database benchmark, we increase the average success rate over all the environments by a significant 35% while decreasing the average path length by 4% with respect to competing methods. Thus, our results are state-of-the-art, even without using any training procedure.

Unsupervised Active Visual Search with Monte Carlo planning under Uncertain Detections

TL;DR

The solution uses the current pose of an agent and an RGB-D observation to learn an optimal search policy, exploiting a POMDP solved by a Monte-Carlo planning approach and incorporates the awareness that an object detector may fail into the aforementioned probability modelling by exploiting the success statistics of a specific detector.

Abstract

We propose a solution for Active Visual Search of objects in an environment, whose 2D floor map is the only known information. Our solution has three key features that make it more plausible and robust to detector failures compared to state-of-the-art methods: (i) it is unsupervised as it does not need any training sessions. (ii) During the exploration, a probability distribution on the 2D floor map is updated according to an intuitive mechanism, while an improved belief update increases the effectiveness of the agent's exploration. (iii) We incorporate the awareness that an object detector may fail into the aforementioned probability modelling by exploiting the success statistics of a specific detector. Our solution is dubbed POMP-BE-PD (Pomcp-based Online Motion Planning with Belief by Exploration and Probabilistic Detection). It uses the current pose of an agent and an RGB-D observation to learn an optimal search policy, exploiting a POMDP solved by a Monte-Carlo planning approach. On the Active Vision Database benchmark, we increase the average success rate over all the environments by a significant 35% while decreasing the average path length by 4% with respect to competing methods. Thus, our results are state-of-the-art, even without using any training procedure.
Paper Structure (15 sections, 5 equations, 8 figures, 3 tables)

This paper contains 15 sections, 5 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: An agent is initialised in a known environment with the task of visually searching for a target object, i.e. to localise the object and approach it. (a) 3D reconstruction of the environment; the agent has to navigate toward the target (yellow star) through the possible shortest path (highlighted in green) while avoiding longer trajectories (in orange) without missing entirely the target (in red). (b) Corresponding 2D grid map of the scene in our POMCP modelling: blue dots are the possible object locations, purple crosses are the possible robot poses.
  • Figure 2: Overall architecture of our proposed method POMP-BE-PD. The red box represents prior knowledge pushed into the POMCP module, the grey box represents the exploration strategy to detect the target object, the yellow box represents the probabilistic docking strategy to reach the destination pose and the green box represents the probability distribution over the locations. Math notation: state $s_t$, action $a_t$, pose $p_t$, observation $o_t$, POMCP state sequence $s_{\{0..T_d\}}$, docking state sequence $s_{\{T_{d+1}..T\}}$, complete state sequence $s_{\{ 0..T\}}$.
  • Figure 3: The two cases considered when creating the vector $D$. Example derived from Home_003_2. In case (a) the objective is to determine the location of the object and assign probabilities in the form of a multivariate normal distribution. In (b), we assign low probabilities to the locations inside the FOV, and high probabilities to the locations outside it. Note: we assign different scales to the colorbar for ease of visualisation.
  • Figure 4: Experiment inside Home_016_1 using the proposed approach POMP-BE-PD. In step (a) we initialise the agent in the environment; we highlight the target position and a false positive area. From step (b) to (c) the robot explores the top area; in step (d) we show the robustness of our approach to a false positive; finally, in step (e) we identify high probable locations, locating the target in step (f).
  • Figure 5: Corresponding 2D floor maps (not in scale) for the test scenes from AVBD of 3 different difficulty levels (as in schmid2019iros). For each environment, we report the name and, in parenthesis, the number of possible object locations. As the difficulty increases, we can note an increment of possible object location and more difficult spatial layouts.
  • ...and 3 more figures