Table of Contents
Fetching ...

Online Pareto-Optimal Decision-Making for Complex Tasks using Active Inference

Peter Amorese, Shohei Wakayama, Nisar Ahmed, Morteza Lahijanian

TL;DR

This work targets safe, transparent multi-objective decision-making for robots operating under uncertainty. It introduces a two-layer MORL framework that combines formal task synthesis (LTLf-based planning) with an active inference-based high-level selector, enabling learning of multiple Pareto-optimal trade-offs while aligning with user preferences. A tractable set of approximations to the Expected Free Energy guides plan selection, balancing exploitation of preferred trade-offs with exploration to map the Pareto front. Bayesian NIW-based online learning of cost distributions supports continual improvement, and empirical validation across simulated Mars exploration, benchmarks, and hardware dishwashing demonstrates improved Pareto-front coverage, adherence to user preferences, and practical viability.

Abstract

When a robot autonomously performs a complex task, it frequently must balance competing objectives while maintaining safety. This becomes more difficult in uncertain environments with stochastic outcomes. Enhancing transparency in the robot's behavior and aligning with user preferences are also crucial. This paper introduces a novel framework for multi-objective reinforcement learning that ensures safe task execution, optimizes trade-offs between objectives, and adheres to user preferences. The framework has two main layers: a multi-objective task planner and a high-level selector. The planning layer generates a set of optimal trade-off plans that guarantee satisfaction of a temporal logic task. The selector uses active inference to decide which generated plan best complies with user preferences and aids learning. Operating iteratively, the framework updates a parameterized learning model based on collected data. Case studies and benchmarks on both manipulation and mobile robots show that our framework outperforms other methods and (i) learns multiple optimal trade-offs, (ii) adheres to a user preference, and (iii) allows the user to adjust the balance between (i) and (ii).

Online Pareto-Optimal Decision-Making for Complex Tasks using Active Inference

TL;DR

This work targets safe, transparent multi-objective decision-making for robots operating under uncertainty. It introduces a two-layer MORL framework that combines formal task synthesis (LTLf-based planning) with an active inference-based high-level selector, enabling learning of multiple Pareto-optimal trade-offs while aligning with user preferences. A tractable set of approximations to the Expected Free Energy guides plan selection, balancing exploitation of preferred trade-offs with exploration to map the Pareto front. Bayesian NIW-based online learning of cost distributions supports continual improvement, and empirical validation across simulated Mars exploration, benchmarks, and hardware dishwashing demonstrates improved Pareto-front coverage, adherence to user preferences, and practical viability.

Abstract

When a robot autonomously performs a complex task, it frequently must balance competing objectives while maintaining safety. This becomes more difficult in uncertain environments with stochastic outcomes. Enhancing transparency in the robot's behavior and aligning with user preferences are also crucial. This paper introduces a novel framework for multi-objective reinforcement learning that ensures safe task execution, optimizes trade-offs between objectives, and adheres to user preferences. The framework has two main layers: a multi-objective task planner and a high-level selector. The planning layer generates a set of optimal trade-off plans that guarantee satisfaction of a temporal logic task. The selector uses active inference to decide which generated plan best complies with user preferences and aids learning. Operating iteratively, the framework updates a parameterized learning model based on collected data. Case studies and benchmarks on both manipulation and mobile robots show that our framework outperforms other methods and (i) learns multiple optimal trade-offs, (ii) adheres to a user preference, and (iii) allows the user to adjust the balance between (i) and (ii).
Paper Structure (35 sections, 46 equations, 10 figures)

This paper contains 35 sections, 46 equations, 10 figures.

Figures (10)

  • Figure 1: Motivating robotic dishwashing scenario: a robotic manipulator needs to compare execution time of a task and inherent risks of dropping fragile dishes (the jar is more fragile than the blue pitcher). The user's preferred trade-off between time and risk is incorporated for transparent behaviors.
  • Figure 2: The DTS-sc $T$ described in Example \ref{['ex: dts']} is shown, with a plan (red) that satisfies the task described in Example \ref{['ex: formula']}.
  • Figure 3: Proposed multi-objective safe reinforcement learning architecture: As the four phases involving 1) Planning, 2) Selection, 3) Execution, and 4) Update are repeatedly executed, the robot gradually learns the state-action cost parameters and selects the user preferred optimal task plan.
  • Figure 4: By treating plans $\pi$ as macro-actions, the selection decision making problem becomes sequential. The example plan highlighted in Fig. \ref{['fig: dts']} is embedded into $\pi_1$, as shown.
  • Figure 5: Simulated rover sample collection: A rover is tasked with repetitively collecting a $sample$ and delivering it to $deposit$. Moving takes time and collects radiation (in the sun). The user prefers that the sample is collected in roughly 90 minutes with about 6 $\mu Gy$ of radiation, represented with $p_\mathrm{pr}$. Top row figures show the computed plans, and the bottom row figures show the estimated Pareto points. The true optimal trade-off plans are shown for comparison in four instance (episode) snapshots. Video:https://youtu.be/tCRJwqeT-f4.
  • ...and 5 more figures

Theorems & Definitions (12)

  • Example 1
  • Definition 1: DTS-sc
  • Example 2
  • Definition 2: LTLf Syntax
  • Example 3
  • Definition 3: $\phi$-Satisfying Plan Set
  • Definition 4: Vector Dominance
  • Definition 5: Pareto Optimal
  • Example 4
  • Definition 6: Deterministic Finite Automaton
  • ...and 2 more