Table of Contents
Fetching ...

Learning-Based Robust Control: Unifying Exploration and Distributional Robustness for Reliable Robotics via Free Energy

Hozefa Jesawada, Giovanni Russo, Abdalla Swikir, Fares Abu-Dakka

TL;DR

This work proposes a model for policy computation that jointly learns environment dynamics and rewards, while ensuring robustness to epistemic uncertainties in both environment and reward, and proposes a modification to the maximum diffusion learning framework.

Abstract

A key challenge towards reliable robotic control is devising computational models that can both learn policies and guarantee robustness when deployed in the field. Inspired by the free energy principle in computational neuroscience, to address these challenges, we propose a model for policy computation that jointly learns environment dynamics and rewards, while ensuring robustness to epistemic uncertainties. Expounding a distributionally robust free energy principle, we propose a modification to the maximum diffusion learning framework. After explicitly characterizing robustness of our policies to epistemic uncertainties in both environment and reward, we validate their effectiveness on continuous-control benchmarks, via both simulations and real-world experiments involving manipulation with a Franka Research~3 arm. Across simulation and zero-shot deployment, our approach narrows the sim-to-real gap, and enables repeatable tabletop manipulation without task-specific fine-tuning.

Learning-Based Robust Control: Unifying Exploration and Distributional Robustness for Reliable Robotics via Free Energy

TL;DR

This work proposes a model for policy computation that jointly learns environment dynamics and rewards, while ensuring robustness to epistemic uncertainties in both environment and reward, and proposes a modification to the maximum diffusion learning framework.

Abstract

A key challenge towards reliable robotic control is devising computational models that can both learn policies and guarantee robustness when deployed in the field. Inspired by the free energy principle in computational neuroscience, to address these challenges, we propose a model for policy computation that jointly learns environment dynamics and rewards, while ensuring robustness to epistemic uncertainties. Expounding a distributionally robust free energy principle, we propose a modification to the maximum diffusion learning framework. After explicitly characterizing robustness of our policies to epistemic uncertainties in both environment and reward, we validate their effectiveness on continuous-control benchmarks, via both simulations and real-world experiments involving manipulation with a Franka Research~3 arm. Across simulation and zero-shot deployment, our approach narrows the sim-to-real gap, and enables repeatable tabletop manipulation without task-specific fine-tuning.
Paper Structure (22 sections, 1 theorem, 28 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 22 sections, 1 theorem, 28 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem IV.1

Suppose the augmented ambiguity radii obey Then, for each $k$ and every $(\mathbf{x},\mathbf{u})$, the true augmented kernels $p^{\mathrm{aug},\delta}_k$ lie in the DR-FREE augmented ambiguity sets $\mathcal{B}_{\eta_k^{\mathrm{aug}}}(\bar{p}^{\mathrm{aug}}_k)$, and the saddle-point policy solving is simultaneously robust to both dynamics and cost perturbations, while preserving the DR-FREE Gibb

Figures (5)

  • Figure 1: Modifying Maximum Diffusion RL with Distributionally Robust Free Energy Principle to address epistemic uncertainty, narrowing sim-to-real gaps for reliable robotics.
  • Figure 2: overview of the proposed control loop. At each step, the agent collects data into a replay buffer, updates the dynamics and cost models, computes the maximally diffusive kernel $p_{\max}$, and solves the min--max optimization. The resulting twisted kernel produces an action that is applied to the system, and the cycle repeats.
  • Figure 3: HalfCheetah results. Top: training performance of the proposed algorithm vs. MaxDiff RL on HalfCheetah-v5 (mean $\pm$1 STD over $n=20$ runs). Middle: rollout frames from the proposed algorithm showing a stable stride reaching the target. Bottom: rollout frames from MaxDiff RL illustrating an unstable gait leading to failure.
  • Figure 4: Franka obstacle-avoidance results with the proposed method. Top: learning curves showing episode return and minimum distance to goal (success threshold 5 cm, mean $\pm$1 STD over $n=20$ runs). Bottom: i) a rollout demonstrating collision-free manipulation around the obstacle. ii) trajectory plot with goal orientation highlighted in cyan, magenta, and yellow, illustrating efficient training and task execution.
  • Figure 5: Deployment on the Franka Research 3 for cluttered tabletop pick-and-place. Top: grasp at the first goal and place at the second. Bottom: with an obstacle, the controller lifts to avoid collision and completes the placement.

Theorems & Definitions (4)

  • Definition II.1
  • Theorem IV.1
  • proof
  • Remark IV.1