Multi-LED Classification as Pretext For Robot Heading Estimation
Nicholas Carlotti, Mirko Nava, Alessandro Giusti
TL;DR
The paper tackles vision-based relative robot localization and heading estimation with limited labeling by formulating a self-supervised pretext task: predict the ON/OFF states of four LEDs mounted on each robot from monocular RGB input. An FCN outputs per-pixel maps for robot presence, heading components, and LED states, deriving the position from the peak of the presence map and the heading from a heading map weighted by the robot's location. Training uses a LED-state loss weighted by the robot projection and LED visibility, enabling detection and heading estimation without pose labels. Results show a median position error of $14.5$ px and a median heading error of $17.0$ deg, close to a supervised upperbound of $10.1$ px and $8.4$ deg on the visible subset, demonstrating practical, low-label learning for multi-robot scenarios.
Abstract
We propose a self-supervised approach for visual robot detection and heading estimation by learning to estimate the states (OFF or ON) of four independent robot-mounted LEDs. Experimental results show a median image-space position error of 14 px and relative heading MAE of 17 degrees, versus a supervised upperbound scoring 10 px and 8 degrees, respectively.
