Table of Contents
Fetching ...

Distributionally Robust Free Energy Principle for Decision-Making

Allahkaram Shafiei, Hozefa Jesawada, Karl Friston, Giovanni Russo

TL;DR

A Distributionally Robust Free Energy model (DR-FREE) is introduced that instills robustness into the agent decision-making mechanisms and may inspire both deployments in multi-agent settings and the quest for an explanation of how natural agents - with little or no training - survive in capricious environments.

Abstract

Despite their groundbreaking performance, autonomous agents can misbehave when training and environmental conditions become inconsistent, with minor mismatches leading to undesirable behaviors or even catastrophic failures. Robustness towards these training-environment ambiguities is a core requirement for intelligent agents and its fulfillment is a long-standing challenge towards their real-world deployments. Here, we introduce a Distributionally Robust Free Energy model (DR-FREE) that instills this core property by design. Combining a robust extension of the free energy principle with a resolution engine, DR-FREE wires robustness into the agent decision-making mechanisms. Across benchmark experiments, DR-FREE enables the agents to complete the task even when, in contrast, state-of-the-art models fail. This milestone may inspire both deployments in multi-agent settings and, at a perhaps deeper level, the quest for an explanation of how natural agents -- with little or no training -- survive in capricious environments.

Distributionally Robust Free Energy Principle for Decision-Making

TL;DR

A Distributionally Robust Free Energy model (DR-FREE) is introduced that instills robustness into the agent decision-making mechanisms and may inspire both deployments in multi-agent settings and the quest for an explanation of how natural agents - with little or no training - survive in capricious environments.

Abstract

Despite their groundbreaking performance, autonomous agents can misbehave when training and environmental conditions become inconsistent, with minor mismatches leading to undesirable behaviors or even catastrophic failures. Robustness towards these training-environment ambiguities is a core requirement for intelligent agents and its fulfillment is a long-standing challenge towards their real-world deployments. Here, we introduce a Distributionally Robust Free Energy model (DR-FREE) that instills this core property by design. Combining a robust extension of the free energy principle with a resolution engine, DR-FREE wires robustness into the agent decision-making mechanisms. Across benchmark experiments, DR-FREE enables the agents to complete the task even when, in contrast, state-of-the-art models fail. This milestone may inspire both deployments in multi-agent settings and, at a perhaps deeper level, the quest for an explanation of how natural agents -- with little or no training -- survive in capricious environments.

Paper Structure

This paper contains 6 sections, 12 equations, 5 figures.

Figures (5)

  • Figure 1: Comparison between free energy and robust free energy for policy computation. a. A robotic agent navigating a stochastic environment to reach a destination while avoiding obstacles. At a given time-step, $k-1$, the agent determines an action $\mathbf{U}_k$ from a policy using a model of the environment (e.g., available at training via a simulator possibly updated via real world data) and observations/beliefs (grouped in the state $\mathbf{X}_{k-1}$). The environment and model can change over time. Capital letters are random variables, lower-case letters are realizations. b. The trained model and the agent environment differ. This mismatch is a training/environment (model) ambiguity: for a state/action pair, the ambiguity set is the set of all possible environments that have statistical complexity from the trained model of at most $\eta_{k}\left(\mathbf{x}_{k-1},\mathbf{u}_{k}\right)$. We use the wording trained model in a very broad sense. A trained model is any model available to the agent offline: for example, this could be a model obtained from a simulator or, for natural agents, this could be hardwired into evolutionary processes or even determined by prior beliefs. c. A free energy minimizing agent in an environment matching its own model. The agent determines an action by sampling from the policy ${\pi}^{\star}_{{k}} \left(\mathbf{u}_{{k}}\mid \mathbf{x}_{{k-1}} \right)$. Given the model, the policy is obtained by minimizing the variational free energy: the sum of a statistical complexity (with respect to a generative model, $q_{0:N}$) and expected loss (state/action costs, $c_{k}^{(x)}\left(\mathbf{x}_{k}\right)$ and $c_{k}^{(u)}\left(\mathbf{u}_{k}\right)$) terms. d. DR-FREE extends the free energy principle to account for model ambiguities. According to DR-FREE, the maximum free energy across all environments -- in an ambiguity set -- is minimized to identify a robust policy. This amounts to variational policy optimization under the epistemic uncertainty engendered by ambiguous environment.
  • Figure 2: DR-FREE a. Summarizing the distributionally robust free energy principle -- the problem statement for policy computation. Our generalization of active inference yields an optimization framework where policies emerge by minimizing the maximum free energy over all possible environments in the ambiguity set, which formalizes the constraints in the problem formulation. b. The resolution engine to find the policy. Given the current state, the engine uses the generative model and the loss to find the maximum free energy ${D}_{\text{KL}}\left(p_{{k}} \left(\mathbf{x}_{{k}}\mid \mathbf{x}_{{k-1}}, \mathbf{u}_{{k}} \right)\mid \mid q_{{k}} \left(\mathbf{x}_{{k}}\mid \mathbf{x}_{{k-1}}, \mathbf{u}_{{k}} \right) \right) + \mathbb{E}_{p_{{k}} \left(\mathbf{x}_{{k}}\mid \mathbf{x}_{{k-1}}, \mathbf{u}_{{k}} \right)}\left[\bar{c}_{k}\left(\mathbf{X}_{k}\right)\right]$ across all the environments in the ambiguity set. This yields the cost of ambiguity $\eta_{k}\left(\mathbf{x}_{k-1},\mathbf{u}_{k}\right)+\tilde{c}\left(\mathbf{x}_{k-1},\mathbf{u}_{k}\right)$ that builds up the expected loss for the subsequent minimization problem. In this second problem, the variational free energy is minimized in the space of polices providing: (i) ${\pi}^{\star}_{{k}} \left(\mathbf{u}_{{k}}\mid \mathbf{x}_{{k-1}} \right)$, the DR-FREE policy from which actions are sampled. Elements that guarantee robustness in green -- these terms depend on ambiguity; (ii) the smallest free energy that the agent can achieve, i.e., the cost-to-go $\bar{c}_{k}\left(\mathbf{x}_{k}\right)$ fed back to the maximization problem at the next time-step. For reactive actions, where $N=1$, the cost-to-go equals the state cost given by the agent loss. c. Using the generative model and the state-cost, DR-FREE first computes the cost of ambiguity, which is non-negative. This, together with the action cost is then used to obtain the exponential kernel in the policy, i.e. $\exp\left(-c_{k}^{(u)}\left(\mathbf{u}_{k}\right)-\eta_{k}\left(\mathbf{x}_{k-1},\mathbf{u}_{k}\right)-\tilde{c}\left(\mathbf{x}_{k-1},\mathbf{u}_{k}\right)\right)$. After multiplication of the kernel with $q_{{k}}\left(\mathbf{u}_{{k}}\mid \mathbf{x}_{{k-1}} \right)$ and normalization, this returns ${\pi}^{\star}_{{k}} \left(\mathbf{u}_{{k}}\mid \mathbf{x}_{{k-1}} \right)$.
  • Figure 3: DR-FREE evaluation. a. Unicycle robots of $11 \text{cm} \times 8.5\text{cm} \times 7.5\text{cm}$ (width, length, height) that need to achieve the goal destination, $\mathbf{x}_d$, avoiding obstacles. The work area is $3\text{m} \times 2\text{m}$, the robot position is the state, and actions are vertical/horizontal speeds; $q_{{k}} \left(\mathbf{x}_{{k}}\mid \mathbf{x}_{{k-1}}, \mathbf{u}_{{k}} \right)$ is a Gaussian centered in $\mathbf{x}_d$ and $q_{{k}}\left(\mathbf{u}_{{k}}\mid \mathbf{x}_{{k-1}} \right)$ is uniform. See Methods for the settings. b. The non-convex state cost for the navigation task. See Methods for the expression. c. Comparison between DR-FREE and a free-energy minimizing agent that makes optimal decisions but is unaware of the ambiguity. DR-FREE enables the robot to successfully complete the task at each training stage. The ambiguity-unaware agent fails, except when the shortest path is obstacle-free. Training details are in Methods. d. Screenshots from the Robotarium platform recording of one experiment. DR-FREE allows the robot (starting top-right) to complete the task (trained model from stage $3$ used). e. How DR-FREE policy changes as a function of ambiguity. By increasing the radius of ambiguity by $50\%$, DR-FREE policy (left) becomes a policy dominated by ambiguity (right). As a result, actions with low ambiguity are assigned higher probability. Screenshot of the robot policy when this is in position $[0.2, 0.9]$, i.e., near the middle obstacle. The ambiguity increase deterministically drives the robot bottom-left (note the higher probability) regardless of the presence of the obstacle. f. Belief update. Speeds/positions from the top-right experiments in panel c) are used together with $F=16$ state/action features, $\varphi_i(\mathbf{x}_{k-1},\mathbf{u}_k) = \mathbb{E}_{\bar{p}_{{k}} \left(\mathbf{x}_{{k}}\mid \mathbf{x}_{{k-1}}, \mathbf{u}_{{k}} \right)}\left[\phi_i(\mathbf{X}_{k})\right]$ in Supplementary Fig. 1b. Once the optimal weights, $w_i^{\star}$, are obtained, the reconstructed cost is $-E_{\bar{p}_{{k}} \left(\mathbf{x}_{{k}}\mid \mathbf{x}_{{k-1}}, \mathbf{u}_{{k}} \right)}\left[\sum_{i=1}^{16}w_i^\star\phi_i(\mathbf{X}_{k})\right]$. Since this lives in a $4$-dimensional space, we show $-\sum_{i=1}^{16}w_i^\star\phi_i(\mathbf{x}_{k})$, which can be conveniently plotted.
  • Figure 4: DR-FREE and MaxDiff. a. MaxDiff success rates for different values of the sampling size and planning horizon. Experiments highlight a sweetspot in the hyperparameters with $100\%$ success rate. Worst rates are obtained for low horizons, where the success rate is between $25\%$ and approximately $40\%$. All experiments are performed with the temperature-like hyperparameter $\alpha$ set to $0.1$. Data for each cells obtained from $12$ experiments corresponding to the initial conditions in Fig. \ref{['fig:results']}c. b. Success rates for different values of $\alpha$ and samples when horizon is set to $2$. Success rates are consistent with the previous panel -- for the best combination of parameters, MaxDiff agent completes the task half of the times. See Supplementary Fig. 7 for a complementary set of MaxDiff experiments. c. Robot trajectories using the MaxDiff policy when the horizon is equal to $2$ and samples is set to $50$. MaxDiff fulfills the task when the shortest path is obstacle-free. d. DR-FREE allows the robot to complete the task when it is equipped with a generative model from MaxDiff computed using the same set of hyperparameters from the previous panel. e. This desirable behavior is confirmed even when samples is decreased to $10$. See Methods and Supplementary Information for details.
  • Figure 5: Ant experiments. a. Screenshot from the MuJoCo environment (Ant v-3). The state space is $29$-dimensional and the action space is $8$-dimensional. b. Performance comparison. Charts show means and bars standard deviations from the means across $30$ experiments. In some episodes, the ambiguity unaware, MaxDiff and NN-MPPI agents terminate prematurely due to the Ant becoming unhealthy; rewards were set to zero from that point to the end of the episode. The Ant becomes unhealthy in $6\%$ of the episodes for the ambiguity unaware agent, $20\%$ for MaxDiff and $23\%$ for NN-MPPI. In contrast, the Ant remains healthy in all DR-FREE experiments. As in previous experiments, DR-FREE is used to compute reactive actions. The ambiguity-unaware policyEG_HJ_CDV_GR:24 corresponds to DR-FREE with the ambiguity radius set to zero.