Table of Contents
Fetching ...

On Convex Data-Driven Inverse Optimal Control for Nonlinear, Non-stationary and Stochastic Systems

Emiland Garrabe, Hozefa Jesawada, Carmen Del Vecchio, Giovanni Russo

TL;DR

A result enabling cost reconstruction by solving an optimization problem that is convex even when the agent cost is not and when the underlying dynamics is nonlinear, non-stationary and stochastic is presented.

Abstract

This paper is concerned with a finite-horizon inverse control problem, which has the goal of reconstructing, from observations, the possibly non-convex and non-stationary cost driving the actions of an agent. In this context, we present a result enabling cost reconstruction by solving an optimization problem that is convex even when the agent cost is not and when the underlying dynamics is nonlinear, non-stationary and stochastic. To obtain this result, we also study a finite-horizon forward control problem that has randomized policies as decision variables. We turn our findings into algorithmic procedures and show the effectiveness of our approach via in-silico and hardware validations. All experiments confirm the effectiveness of our approach.

On Convex Data-Driven Inverse Optimal Control for Nonlinear, Non-stationary and Stochastic Systems

TL;DR

A result enabling cost reconstruction by solving an optimization problem that is convex even when the agent cost is not and when the underlying dynamics is nonlinear, non-stationary and stochastic is presented.

Abstract

This paper is concerned with a finite-horizon inverse control problem, which has the goal of reconstructing, from observations, the possibly non-convex and non-stationary cost driving the actions of an agent. In this context, we present a result enabling cost reconstruction by solving an optimization problem that is convex even when the agent cost is not and when the underlying dynamics is nonlinear, non-stationary and stochastic. To obtain this result, we also study a finite-horizon forward control problem that has randomized policies as decision variables. We turn our findings into algorithmic procedures and show the effectiveness of our approach via in-silico and hardware validations. All experiments confirm the effectiveness of our approach.
Paper Structure (17 sections, 4 theorems, 52 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 17 sections, 4 theorems, 52 equations, 5 figures, 1 table, 2 algorithms.

Key Result

Lemma 1

Let $\mathbf{V}$ and $\mathbf{Z}$ be two random variables and let $f(\mathbf{v},\mathbf{z})$ and $g(\mathbf{v},\mathbf{z})$ be two joint pfs. Then:

Figures (5)

  • Figure 1: Target pendulum angular position and corresponding control input. Results obtained when: (i) pfs are discrete, estimated via the histogram filter (left panels); (ii) pfs are estimated via Gaussian Processes (right panels). Panels obtained from $20$ simulations. Bold lines represent the mean and the shaded region is confidence interval corresponding to the standard deviation.
  • Figure 2: Angular position and control input of the target Pendulum when the pf is estimated via the histogram filter (leftand middle panels) and Gaussian Processes (right panels). Figures obtained from $20$ simulations, using ${c}^{\star}(\cdot)$ as an input to Algorithm \ref{['alg:main']}. Bold lines represents the mean; the shaded region is confidence interval corresponding to the standard deviation.
  • Figure 3: Top left: original cost function. In the other panels the cost reconstructed via: Algorithm \ref{['alg:estimator']} (top-right), MaxEnt (bottom-left) and IHMCE (bottom-right).
  • Figure 4: Top-left: robot trajectories starting from different initial positions ($\star$) when the policy in \ref{['eqn:gaussian_policy']} - \ref{['eqn:gaussian_policy_recursion']} is used (with $N=1$). Top-right: the $\mathbf{o}_i$'s together with the weights obtained via Algorithm \ref{['alg:estimator']}. Bottom: reconstructed cost (left) and robot trajectories when Algorithm \ref{['alg:main']} is used with this cost. Robot starts from initial positions that are different from those in the top panel.
  • Figure 5: Top-left: cost for the FOC problem. Top-right: robot trajectories when the policy from Algorithm \ref{['alg:main']} is used (same initial positions and destination of Scenario $1$). Bottom panels: cost reconstructed via Algorithm \ref{['alg:estimator']} (left) and robot trajectories when Algorithm \ref{['alg:main']} is used with the estimated cost. Robots start from initial positions that are different from these in the top panel.

Theorems & Definitions (16)

  • Lemma 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Theorem 1
  • Remark 6
  • Remark 7
  • Remark 8
  • ...and 6 more