Latent Representations for Visual Proprioception in Inexpensive Robots

Sahara Sheikholeslami; Ladislau Bölöni

Latent Representations for Visual Proprioception in Inexpensive Robots

Sahara Sheikholeslami, Ladislau Bölöni

TL;DR

This work addresses visual proprioception for inexpensive robots by asking whether a fast, single-pass regression can infer the robot's 6-DOF configuration from a single RGB image using a compact latent representation ${\mathbf{z}}_{prop}$ of size ${128}$ or ${256}$. It compares four latent-encoder families—Conv-VAE, proprioception-tuned CNNs, Vision Transformers, and bags of uncalibrated ArUco markers—and deploys a uniform MLP regressor to map ${\mathbf{z}}_{prop}$ to ${\mathbf{a}} \in [0,1]^6$. Experimental results with a 6-DOF Lynxmotion arm show that accuracy depends on the component, with heading easiest and wrist rotation/gripper state hardest; 128-dimensional latents often perform on par with or better than 256-dimensional ones, though performance varies by representation. The findings demonstrate feasible, low-computation visual proprioception for inexpensive robots and provide guidance on representation choice and practical deployment, with future work including temporal filtering and additional sensing.

Abstract

Robotic manipulation requires explicit or implicit knowledge of the robot's joint positions. Precise proprioception is standard in high-quality industrial robots but is often unavailable in inexpensive robots operating in unstructured environments. In this paper, we ask: to what extent can a fast, single-pass regression architecture perform visual proprioception from a single external camera image, available even in the simplest manipulation settings? We explore several latent representations, including CNNs, VAEs, ViTs, and bags of uncalibrated fiducial markers, using fine-tuning techniques adapted to the limited data available. We evaluate the achievable accuracy through experiments on an inexpensive 6-DoF robot.

Latent Representations for Visual Proprioception in Inexpensive Robots

TL;DR

Abstract

Latent Representations for Visual Proprioception in Inexpensive Robots

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)