Table of Contents
Fetching ...

BoxMap: Efficient Structural Mapping and Navigation

Zili Wang, Christopher Allum, Sean B. Andersson, Roberto Tron

TL;DR

BoxMap is proposed, a Detection-Transformer-based architecture that takes advantage of the structure of the sensed partial environment to update a topological graph of the environment as a set of semantic entities and their relations and their relations (connectivity).

Abstract

While humans can successfully navigate using abstractions, ignoring details that are irrelevant to the task at hand, most existing robotic applications require the maintenance of a detailed environment representation which consumes a significant amount of sensing, computing, and storage. These issues are particularly important in a resource-constrained setting with limited power budget. Deep learning methods can learn from prior experience to abstract knowledge of unknown environments, and use it to execute tasks (e.g., frontier exploration, object search, or scene understanding) more efficiently. We propose BoxMap, a Detection-Transformer-based architecture that takes advantage of the structure of the sensed partial environment to update a topological graph of the environment as a set of semantic entities (e.g. rooms and doors) and their relations (e.g. connectivity). These predictions from low-level measurements can then be leveraged to achieve high-level goals with lower computational costs than methods based on detailed representations. As an example application, we consider a robot equipped with a 2-D laser scanner tasked with exploring a residential building. Our BoxMap representation scales quadratically with the number of rooms (with a small constant), resulting in significant savings over a full geometric map. Moreover, our high-level topological representation results in 30.9% shorter trajectories in the exploration task with respect to a standard method.

BoxMap: Efficient Structural Mapping and Navigation

TL;DR

BoxMap is proposed, a Detection-Transformer-based architecture that takes advantage of the structure of the sensed partial environment to update a topological graph of the environment as a set of semantic entities and their relations and their relations (connectivity).

Abstract

While humans can successfully navigate using abstractions, ignoring details that are irrelevant to the task at hand, most existing robotic applications require the maintenance of a detailed environment representation which consumes a significant amount of sensing, computing, and storage. These issues are particularly important in a resource-constrained setting with limited power budget. Deep learning methods can learn from prior experience to abstract knowledge of unknown environments, and use it to execute tasks (e.g., frontier exploration, object search, or scene understanding) more efficiently. We propose BoxMap, a Detection-Transformer-based architecture that takes advantage of the structure of the sensed partial environment to update a topological graph of the environment as a set of semantic entities (e.g. rooms and doors) and their relations (e.g. connectivity). These predictions from low-level measurements can then be leveraged to achieve high-level goals with lower computational costs than methods based on detailed representations. As an example application, we consider a robot equipped with a 2-D laser scanner tasked with exploring a residential building. Our BoxMap representation scales quadratically with the number of rooms (with a small constant), resulting in significant savings over a full geometric map. Moreover, our high-level topological representation results in 30.9% shorter trajectories in the exploration task with respect to a standard method.
Paper Structure (20 sections, 11 equations, 8 figures, 1 table)

This paper contains 20 sections, 11 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Architecture diagram: the TSDF from previous prediction is converted to an occupancy grid map, which is concatenated with the new laser measurement to be fed into a DETR-based model (a combination of CNN and transformer) to give multiple embeddings. Each embedding is used to predict a room box and each pair of embeddings is used to predict a door box. These boxes are transformed into a single TSDF, which is then compared with the ground truth to update the model.
  • Figure 2: The connectivity between two room boxes is validated by comparing the edges N, S, W, E of one room (in green) to the edges S, N, E, W of the other room (in blue). An entry will appear in the room box adjacency matrix if corresponding edges overlap, in this case green S and blue N.
  • Figure 3: Shape Merging Operation.
  • Figure 4: \ref{['fig:door_diamond_map']} Ground truth TSDF with boundary overlaid in black, \ref{['fig:door_diamond_map_room']} Predicted room TSDF, \ref{['fig:door_diamond_diff']} Difference between \ref{['fig:door_diamond_map']} and \ref{['fig:door_diamond_map_room']} highlights the door locations.
  • Figure 5: The progression of the agent exploration over time. It starts by instantiating the last estimated graph (row 1), then collects the laser measurement (row 2), and centers them to the network. The estimated room and door boxes (row 3) are used to construct a pose graph and plan the next move (row 4, shading indicates the room has been visited, overlaid with the predicted map). On row 4 and 5, the purple dot is the robot pose, red star is the point-to-go and red diamond is the nearest door. On the ground truth floorplan (row 5), green dots are the pose history of laser measurements.
  • ...and 3 more figures