LiDAR-BIND-T: Improved and Temporally Consistent Sensor Modality Translation and Fusion for Robotic Applications
Niels Balemans, Ali Anwar, Jan Steckel, Siegfried Mercelis
TL;DR
LiDAR-BIND-T tackles temporal inconsistency in LiDAR-BIND by introducing three temporal mechanisms: embedding similarity across consecutive frames, a motion-aligned transformation loss, and temporal windowing with a fusion module, together with architectural refinements to preserve spatial structure. The approach retains LiDAR-BIND’s modular fusion while significantly improving temporal stability and SLAM performance, validated with domain-specific metrics such as FVMD and a correlation-peak displacement measure. Experiments on radar-to-LiDAR and sonar-to-LiDAR translation demonstrate lower APE and better occupancy-map accuracy in Cartographer-based SLAM, especially under degraded optical conditions. The work provides a practical, modular pathway to robust, temporally consistent multi-modal fusion for real-time robotic perception.
Abstract
This paper extends LiDAR-BIND, a modular multi-modal fusion framework that binds heterogeneous sensors (radar, sonar) to a LiDAR-defined latent space, with mechanisms that explicitly enforce temporal consistency. We introduce three contributions: (i) temporal embedding similarity that aligns consecutive latent representations, (ii) a motion-aligned transformation loss that matches displacement between predictions and ground truth LiDAR, and (iii) windowed temporal fusion using a specialised temporal module. We further update the model architecture to better preserve spatial structure. Evaluations on radar/sonar-to-LiDAR translation demonstrate improved temporal and spatial coherence, yielding lower absolute trajectory error and better occupancy map accuracy in Cartographer-based SLAM (Simultaneous Localisation and Mapping). We propose different metrics based on the Fréchet Video Motion Distance (FVMD) and a correlation-peak distance metric providing practical temporal quality indicators to evaluate SLAM performance. The proposed temporal LiDAR-BIND, or LiDAR-BIND-T, maintains modular modality fusion while substantially enhancing temporal stability, resulting in improved robustness and performance for downstream SLAM.
