Table of Contents
Fetching ...

From Pixels to Torques with Linear Feedback

Jeong Hun Lee, Sam Schoedel, Aditya Bhardwaj, Zachary Manchester

TL;DR

This work tackles the problem of controlling a dynamic robotic system using only camera images by learning a linear observer-based policy that maps pixels to torques. It introduces a data-efficient teacher-student framework to learn a Luenberger observer from state-feedback demonstrations and combines it with a linear output-feedback controller, with stability guaranteed via a convex LMI constraint. A nonlinear extension based on Koopman embeddings broadens the approach to handle nonlinear dynamics, and the method is validated on a cartpole in both simulation and hardware, showing robustness to noise, disturbances, and occlusions. The results highlight the practicality of image-to-state estimation and linear control theory for vision-based robotics, offering data efficiency, interpretability, and stability guarantees that are advantageous for real-world deployment.

Abstract

We demonstrate the effectiveness of simple observer-based linear feedback policies for "pixels-to-torques" control of robotic systems using only a robot-facing camera. Specifically, we show that the matrices of an image-based Luenberger observer (linear state estimator) for a "student" output-feedback policy can be learned from demonstration data provided by a "teacher" state-feedback policy via simple linear-least-squares regression. The resulting linear output-feedback controller maps directly from high-dimensional raw images to torques while being amenable to the rich set of analytical tools from linear systems theory, allowing us to enforce closed-loop stability constraints in the learning problem. We also investigate a nonlinear extension of the method via the Koopman embedding. Finally, we demonstrate the surprising effectiveness of linear pixels-to-torques policies on a cartpole system, both in simulation and on real hardware. The policy successfully executes both stabilizing and swing-up trajectory-tracking tasks using only camera feedback while subject to model mismatch, process and sensor noise, perturbations, and occlusions. Open-source code for all experiments can be found here: https://roboticexplorationlab.org/projects/linear_pixels_to_torques.html

From Pixels to Torques with Linear Feedback

TL;DR

This work tackles the problem of controlling a dynamic robotic system using only camera images by learning a linear observer-based policy that maps pixels to torques. It introduces a data-efficient teacher-student framework to learn a Luenberger observer from state-feedback demonstrations and combines it with a linear output-feedback controller, with stability guaranteed via a convex LMI constraint. A nonlinear extension based on Koopman embeddings broadens the approach to handle nonlinear dynamics, and the method is validated on a cartpole in both simulation and hardware, showing robustness to noise, disturbances, and occlusions. The results highlight the practicality of image-to-state estimation and linear control theory for vision-based robotics, offering data efficiency, interpretability, and stability guarantees that are advantageous for real-world deployment.

Abstract

We demonstrate the effectiveness of simple observer-based linear feedback policies for "pixels-to-torques" control of robotic systems using only a robot-facing camera. Specifically, we show that the matrices of an image-based Luenberger observer (linear state estimator) for a "student" output-feedback policy can be learned from demonstration data provided by a "teacher" state-feedback policy via simple linear-least-squares regression. The resulting linear output-feedback controller maps directly from high-dimensional raw images to torques while being amenable to the rich set of analytical tools from linear systems theory, allowing us to enforce closed-loop stability constraints in the learning problem. We also investigate a nonlinear extension of the method via the Koopman embedding. Finally, we demonstrate the surprising effectiveness of linear pixels-to-torques policies on a cartpole system, both in simulation and on real hardware. The policy successfully executes both stabilizing and swing-up trajectory-tracking tasks using only camera feedback while subject to model mismatch, process and sensor noise, perturbations, and occlusions. Open-source code for all experiments can be found here: https://roboticexplorationlab.org/projects/linear_pixels_to_torques.html

Paper Structure

This paper contains 22 sections, 31 equations, 7 figures.

Figures (7)

  • Figure 1: An overview of the problem we address, in which a simple linear output-feedback policy must control a robotic system using only feedback from images. In this work, the robotic system is both a simulated and real-world cartpole.
  • Figure 2: Overview of learning a linear output-feedback policy for pixels-to-torques control. In this work, a linear-quadratic-Gaussian (LQG) "teacher" policy is designed to control a cartpole system. Teacher demonstration data is collected as trajectories of the robot's estimated states, $\hat{X}_{1:N}$; control inputs, $U_{1:N-1}$; and corresponding images from a robot-facing camera, $Y^{p}_{2:N}$. Subsequent linear-least-squares (LLS) regression is performed over the data to determine the parameters, $\theta$, of a "student" policy's pixel-based Luenberger observer (linear state estimator). The student policy's controller can also be learned separately or cloned from the teacher's, as is the case in this work. The solid and dotted lines indicate processes that were performed online and offline respectively.
  • Figure 3: Stabilization performance vs. training trajectories of a pixel-based linear output-feedback policy tasked with stabilizing a cartpole using a robot-facing camera. Test stabilizations from 100 different initial conditions were evaluated with stabilization error defined as the $L^2$ error of the final state w.r.t the upright goal state. The median error is shown as a thick line, while the shaded regions represent the 5% to 95% bounds. The corresponding success rate of stabilization from the 100 initial conditions is also shown.
  • Figure 4: Heat-map visualizations of each normalized row of the pixel-based Luenberger observer's gain matrix, $L$. A cartpole visualization is also overlayed for reference. Each row of $L$ corresponds to the correction an image observation contributes to a respective state variable. Interestingly, visual features can be distinguished for each state variable: the cart for the cart position with the addition of the pole tip for the pole angle. The velocities also have similar features.
  • Figure 5: A successful cartpole swing up performed by a Koopman-based extension of the pixel-based linear output-feedback policy. The policy is able to overcome process noise and model mismatch to track a reference trajectory on a nonlinear system.
  • ...and 2 more figures