Table of Contents
Fetching ...

Context-Generative Default Policy for Bounded Rational Agent

Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu

TL;DR

This work introduces a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the imagined unobserved map.

Abstract

Bounded rational agents often make decisions by evaluating a finite selection of choices, typically derived from a reference point termed the $`$default policy,' based on previous experience. However, the inherent rigidity of the static default policy presents significant challenges for agents when operating in unknown environment, that are not included in agent's prior knowledge. In this work, we introduce a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the $\textit{imagined}$ unobserved map. Furthermore, the adaptive nature of the bounded rationality framework enables the robot to manage unreliable or incorrect imaginations by selectively sampling a few trajectories in the vicinity of the default policy. Our approach utilizes a diffusion model for map prediction and a sampling-based planning with B-spline trajectory optimization to generate the default policy. Extensive evaluations reveal that the context-generative policy outperforms the baseline methods in identifying and avoiding unseen obstacles. Additionally, real-world experiments conducted with the Crazyflie drones demonstrate the adaptability of our proposed method, even when acting in environments outside the domain of the training distribution.

Context-Generative Default Policy for Bounded Rational Agent

TL;DR

This work introduces a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the imagined unobserved map.

Abstract

Bounded rational agents often make decisions by evaluating a finite selection of choices, typically derived from a reference point termed the default policy,' based on previous experience. However, the inherent rigidity of the static default policy presents significant challenges for agents when operating in unknown environment, that are not included in agent's prior knowledge. In this work, we introduce a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the unobserved map. Furthermore, the adaptive nature of the bounded rationality framework enables the robot to manage unreliable or incorrect imaginations by selectively sampling a few trajectories in the vicinity of the default policy. Our approach utilizes a diffusion model for map prediction and a sampling-based planning with B-spline trajectory optimization to generate the default policy. Extensive evaluations reveal that the context-generative policy outperforms the baseline methods in identifying and avoiding unseen obstacles. Additionally, real-world experiments conducted with the Crazyflie drones demonstrate the adaptability of our proposed method, even when acting in environments outside the domain of the training distribution.
Paper Structure (13 sections, 8 equations, 5 figures)

This paper contains 13 sections, 8 equations, 5 figures.

Figures (5)

  • Figure 1: Snapshot of the navigation task at time $t$, illustrating the default policy distribution. The predicted environment $\Tilde{e}$ is derived from the context $c_t$, indicated by the cross-hatched square. The yellow trajectories, sampled from $Q_t$, extend the horizon to the goal to enhance comprehension of the default policy but they can be truncated to any desired value during code implementation, as depicted by the red dots.
  • Figure 2: Illustration of the performance evaluation with respect to navigation task. The X-axis indicates the time while the Y-axis indicates the distance to goal in (a), the explored area in (b), and map prediction accuracy in (c).
  • Figure 3: Illustration of a grid environment with the starting point (blue square), robot's current position (blue circle), and the sensor field of view (cyan circle). The path traveled by the robot is shown in green, the yellow paths represent the default distribution with predicted mean path. (a) to (e) showcase the performance of the proposed method on the predicted map. (f) to (j) illustrate the performance of the CN-$Q_t$ method on the observed map. (k) to (o) present the performance of the CI-$Q_t$ method, which only considers sensor's field of view, the paths are overlaid on the ground truth map. All snapshots were captured simultaneously from the starting point to facilitate direct comparison.
  • Figure 4: Illustration of the impact of providing initial context on the navigation task and the performance of baselines and proposed method in four different environments. (a) and (b) shows the performance of proposed method on increasing initial context where x-axis represents time. Y-axis represents the distance to goal in (a) and navigation efficiency in (b). (c) demonstrates the path length on y-axis for 4 different maps. Note that the vertical black line shows the difference in the improvement for path length when given more initial context. The path travelled by all the methods on the 4 maps are visualised in (d) to (g) in which the start position is represented by blue square and goal is shown by the green circle. The black dotted circles on the map highlights the area where baselines encounter difficulties. Red path represents CI-$Q_t$, blue path represents CN-$Q_t$ and green path represents CG-$Q_t$.
  • Figure 5: Illustrates the experimental setup and snapshots of drones in action. (a) Depicts the initial environment. Drone starts from green circle and goal is represented by light blue circle. (b) Shows the snapshots drone following the path by our proposed method. (c) Demonstrates the path for CN-$Q_t$. (d) shows the path for CI-$Q_t$. Note that the testing environment is different than the initial environment.