Table of Contents
Fetching ...

HoloSpot: Intuitive Object Manipulation via Mixed Reality Drag-and-Drop

Pablo Soler Garcia, Petar Lukovic, Lucie Reynaud, Andrea Sgobbi, Federica Bruni, Martin Brun, Marc Zünd, Riccardo Bollati, Marc Pollefeys, Hermann Blum, Zuria Bauer

TL;DR

An interface system projecting a 3D representation of a scanned room as a scaled-down 'dollhouse' hologram, allowing users to select and manipulate objects using a straightforward drag-and-drop interface is presented, laying the groundwork for a robust framework that advances the potential for seamless human-robot collaboration in diverse applications.

Abstract

Human-robot interaction through mixed reality (MR) technologies enables novel, intuitive interfaces to control robots in remote operations. Such interfaces facilitate operations in hazardous environments, where human presence is risky, yet human oversight remains crucial. Potential environments include disaster response scenarios and areas with high radiation or toxic chemicals. In this paper we present an interface system projecting a 3D representation of a scanned room as a scaled-down 'dollhouse' hologram, allowing users to select and manipulate objects using a straightforward drag-and-drop interface. We then translate these drag-and-drop user commands into real-time robot actions based on the recent Spot-Compose framework. The Unity-based application provides an interactive tutorial and a user-friendly experience, ensuring ease of use. Through comprehensive end-to-end testing, we validate the system's capability in executing pick-and-place tasks and a complementary user study affirms the interface's intuitive controls. Our findings highlight the advantages of this interface in improving user experience and operational efficiency. This work lays the groundwork for a robust framework that advances the potential for seamless human-robot collaboration in diverse applications. Paper website: https://holospot.github.io/

HoloSpot: Intuitive Object Manipulation via Mixed Reality Drag-and-Drop

TL;DR

An interface system projecting a 3D representation of a scanned room as a scaled-down 'dollhouse' hologram, allowing users to select and manipulate objects using a straightforward drag-and-drop interface is presented, laying the groundwork for a robust framework that advances the potential for seamless human-robot collaboration in diverse applications.

Abstract

Human-robot interaction through mixed reality (MR) technologies enables novel, intuitive interfaces to control robots in remote operations. Such interfaces facilitate operations in hazardous environments, where human presence is risky, yet human oversight remains crucial. Potential environments include disaster response scenarios and areas with high radiation or toxic chemicals. In this paper we present an interface system projecting a 3D representation of a scanned room as a scaled-down 'dollhouse' hologram, allowing users to select and manipulate objects using a straightforward drag-and-drop interface. We then translate these drag-and-drop user commands into real-time robot actions based on the recent Spot-Compose framework. The Unity-based application provides an interactive tutorial and a user-friendly experience, ensuring ease of use. Through comprehensive end-to-end testing, we validate the system's capability in executing pick-and-place tasks and a complementary user study affirms the interface's intuitive controls. Our findings highlight the advantages of this interface in improving user experience and operational efficiency. This work lays the groundwork for a robust framework that advances the potential for seamless human-robot collaboration in diverse applications. Paper website: https://holospot.github.io/

Paper Structure

This paper contains 16 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Visualization of Our System. Left: A user wearing a Hololens to interact with a holographic representation of the environment. Right: Robot pick-and-place sequence of actions. Top: Optimal grasp selection with AnyGrasp.
  • Figure 2: System overview. Our method relies on both offline and online segments. Offline segment (left), is used to construct 3D scene that will be utilised in the subsequent online phase. The online segment (right) is used to control the Spot using the HoloLens with the help of an online server.
  • Figure 3: HoloLens Interface. The above figures show various aspects of the visual interface on HoloLens. Labeling images with number left to right, and top to bottom we have: on the first image bounding boxes displaying movable objects following "show items" voice command, the following image shows how robot status is displayed during manipulation. Images four and five display user manipulating watering can and a drawer. On the fifth and sixth images we can see virtual representation of the robot and menu containing battery percentage and additional status information.
  • Figure 4: Offline pipeline. The preprocessing done before deploying our system can be separated into three parts: Scene Reconstruction, Reconstruction Processing and Scene Separation. Scene reconstruction consists of gathering high and low resolution scans using the iPad LiDAR and the Spot cameras, respectively. Recorded point clouds are then aligned into the same coordinate system and the high resolution scan is segmented using OpenMask3D takmaz2023openmask3d. At the end we manually separate segmented instances into draggable objects and static environment.
  • Figure 5: Online pipeline. When the system is deployed (online) it follows the given loop. Start of the pick-and-place procedure is triggered by the HoloLens user when he places an object in the scene. This sends a signal to the intermediate server with additional information about the object and its location. After a successful information exchange, the robot is localized. Next, the grasp and path are calculated on the server which then sends the commands to the robot. After the robot arrived to the location, grasp optimisation is performed using ICP algorithm 121791. At the end the robot performs the grasp, moves the object and returns to the starting position where it localizes itself waiting for another trigger signal.
  • ...and 3 more figures