Table of Contents
Fetching ...

Learning to Look: Seeking Information for Decision Making via Policy Factorization

Shivin Dass, Jiaheng Hu, Ben Abbatematteo, Peter Stone, Roberto Martín-Martín

TL;DR

The proposed DISaM, a dual policy solution composed of an information-seeking policy that explores the environment to find the relevant contextual information and an information-receiving policy that exploits the context to achieve the manipulation goal, significantly outperforms existing methods.

Abstract

Many robot manipulation tasks require active or interactive exploration behavior in order to be performed successfully. Such tasks are ubiquitous in embodied domains, where agents must actively search for the information necessary for each stage of a task, e.g., moving the head of the robot to find information relevant to manipulation, or in multi-robot domains, where one scout robot may search for the information that another robot needs to make informed decisions. We identify these tasks with a new type of problem, factorized Contextual Markov Decision Processes, and propose DISaM, a dual-policy solution composed of an information-seeking policy that explores the environment to find the relevant contextual information and an information-receiving policy that exploits the context to achieve the manipulation goal. This factorization allows us to train both policies separately, using the information-receiving one to provide reward to train the information-seeking policy. At test time, the dual agent balances exploration and exploitation based on the uncertainty the manipulation policy has on what the next best action is. We demonstrate the capabilities of our dual policy solution in five manipulation tasks that require information-seeking behaviors, both in simulation and in the real-world, where DISaM significantly outperforms existing methods. More information at https://robin-lab.cs.utexas.edu/learning2look/.

Learning to Look: Seeking Information for Decision Making via Policy Factorization

TL;DR

The proposed DISaM, a dual policy solution composed of an information-seeking policy that explores the environment to find the relevant contextual information and an information-receiving policy that exploits the context to achieve the manipulation goal, significantly outperforms existing methods.

Abstract

Many robot manipulation tasks require active or interactive exploration behavior in order to be performed successfully. Such tasks are ubiquitous in embodied domains, where agents must actively search for the information necessary for each stage of a task, e.g., moving the head of the robot to find information relevant to manipulation, or in multi-robot domains, where one scout robot may search for the information that another robot needs to make informed decisions. We identify these tasks with a new type of problem, factorized Contextual Markov Decision Processes, and propose DISaM, a dual-policy solution composed of an information-seeking policy that explores the environment to find the relevant contextual information and an information-receiving policy that exploits the context to achieve the manipulation goal. This factorization allows us to train both policies separately, using the information-receiving one to provide reward to train the information-seeking policy. At test time, the dual agent balances exploration and exploitation based on the uncertainty the manipulation policy has on what the next best action is. We demonstrate the capabilities of our dual policy solution in five manipulation tasks that require information-seeking behaviors, both in simulation and in the real-world, where DISaM significantly outperforms existing methods. More information at https://robin-lab.cs.utexas.edu/learning2look/.

Paper Structure

This paper contains 22 sections, 1 equation, 9 figures, 1 table, 2 algorithms.

Figures (9)

  • Figure 1: DISaM for tasks with information-seeking behavior. To make the right decision in a task (e.g., what beverage to pick or in what dining set to place it), a robot may need to seek task-relevant information (the time of day to decide the beverage or the person at the table to choose where to place it). We formalize such information-seeking tasks as factorized contextual MDPs and solve them with a dual policy collaborative approach where an information-seeking policy ($\pi^\mathit{IS}$) takes active perception actions to search for the right contextual information, and an information-receiving policy ($\pi^\mathit{IR}$) consumes this retrieved context to select the right manipulation actions.
  • Figure 2: Two learning stages of DISaM. In Phase 1, we learn the information-receiving policy $\pi_\mathit{IR}$ that takes in ground-truth context information and controls the movement of the robot. In Phase 2, we learn an information-seeking policy $\pi_\mathit{IS}$ as well as an image encoder $E_\phi$ such that the context can be correctly reconstructed from the camera observation. Once all parts are trained, together they create a system that takes in image observations and controls both the robot and the camera.
  • Figure 3: Deployment of DISaM. When uncertainty over IR's next action is low, DISaM follows the IR actions; when the uncertainty is high, DISaM follows IS policy.
  • Figure 4: Tasks in our evaluation of DISaM. We evaluated DISaM on 3 simulation tasks --- Cooking, Walls, Assembly --- and two real-world tasks with the Tiago robot --- Button, and Teatime. These tasks each require different information-gathering strategies, and demonstrate the sophisticated active and interactive information gathering capabilities of DISaM.
  • Figure 5: Evaluation results. The evaluations are performed across three seeds with 50 rollouts each in sim environment and 1 seed with 10 rollouts in the real world tasks. Across all 5 tasks, DISaM significantly outperforms the baselines.
  • ...and 4 more figures