Self-supervised cost of transport estimation for multimodal path planning
Vincent Gherold, Ioannis Mandralis, Eric Sihite, Adarsh Salagame, Alireza Ramezani, Morteza Gharib
TL;DR
This work introduces a self-supervised RGB-D pipeline to estimate pixel-wise cost of transport (COT) for multimodal robots, enabling energy-aware path planning. By projecting COT predictions into Bird's Eye View maps and fusing local maps into a global traversability representation, the approach supports real-time navigation on constrained hardware like the Nvidia Jetson Orin Nano. The method leverages self-supervised label generation with trajectory-based COT computation, SAM-based augmentation, and an autoencoder confidence mechanism, selecting AsymFormer as the best-performing model with strong MSE and inference efficiency. Practically, the framework demonstrates energy-efficient routing via A* in real-world terrains, highlighting its potential to unlock multimodal robots’ navigation and exploration capabilities.
Abstract
Autonomous robots operating in real environments are often faced with decisions on how best to navigate their surroundings. In this work, we address a particular instance of this problem: how can a robot autonomously decide on the energetically optimal path to follow given a high-level objective and information about the surroundings? To tackle this problem we developed a self-supervised learning method that allows the robot to estimate the cost of transport of its surroundings using only vision inputs. We apply our method to the multi-modal mobility morphobot (M4), a robot that can drive, fly, segway, and crawl through its environment. By deploying our system in the real world, we show that our method accurately assigns different cost of transports to various types of environments e.g. grass vs smooth road. We also highlight the low computational cost of our method, which is deployed on an Nvidia Jetson Orin Nano robotic compute unit. We believe that this work will allow multi-modal robotic platforms to unlock their full potential for navigation and exploration tasks.
