Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications
Gasser Elazab, Maximilian Jansen, Michael Unterreiner, Olaf Hellwich
TL;DR
Gamma-from-Mono (GfM) tackles monocular road perception by predicting a global road-plane normal and a per-pixel gamma (height/depth ratio), turning a single image into metric, road-relative geometry with minimal calibration. The method leverages a self-supervised framework using neighboring frames and a planar homography to align the road plane and a residual parallax flow to capture non-planar details, while a probabilistic road mask focuses supervision on near-field regions. Through a carefully designed loss suite, depth and gamma are co-optimized, achieving competitive depth, state-of-the-art near-field gamma accuracy, and strong RS RD performance with only 8.88M parameters. The work demonstrates the practicality of gamma-space as a robust, interpretable representation for road topology, enabling more reliable vehicle planning and safety-aware control without heavy annotated data. Limitations include dependence on accurate camera height for metric scale and challenges with distant dynamic objects; future work envisions scaling gamma supervision to large driving data and integrating gamma-based geometry into planning pipelines.
Abstract
Accurate perception of the vehicle's 3D surroundings, including fine-scale road geometry, such as bumps, slopes, and surface irregularities, is essential for safe and comfortable vehicle control. However, conventional monocular depth estimation often oversmooths these features, losing critical information for motion planning and stability. To address this, we introduce Gamma-from-Mono (GfM), a lightweight monocular geometry estimation method that resolves the projective ambiguity in single-camera reconstruction by decoupling global and local structure. GfM predicts a dominant road surface plane together with residual variations expressed by gamma, a dimensionless measure of vertical deviation from the plane, defined as the ratio of a point's height above it to its depth from the camera, and grounded in established planar parallax geometry. With only the camera's height above ground, this representation deterministically recovers metric depth via a closed form, avoiding full extrinsic calibration and naturally prioritizing near-road detail. Its physically interpretable formulation makes it well suited for self-supervised learning, eliminating the need for large annotated datasets. Evaluated on KITTI and the Road Surface Reconstruction Dataset (RSRD), GfM achieves state-of-the-art near-field accuracy in both depth and gamma estimation while maintaining competitive global depth performance. Our lightweight 8.88M-parameter model adapts robustly across diverse camera setups and, to our knowledge, is the first self-supervised monocular approach evaluated on RSRD.
