Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation
Friedhelm Hamann, Ziyun Wang, Ioannis Asmanis, Kenneth Chaney, Guillermo Gallego, Kostas Daniilidis
TL;DR
This work addresses dense, long-duration motion estimation from event cameras by overcoming the sim-to-real gap and lack of dense ground truth through a self-supervised contrast-maximization loss augmented with non-linear motion priors. It predicts dense per-pixel continuous-time trajectories $\mathbf{q}_n(t)$ via basis expansions and uses a soft association of events to $N_{\text{traj}}$ nearest trajectories, with a memory-efficient, differentiable warping pipeline built on a coarse displacement field and KNN implemented with KeOps. The loss maximizes the sharpness of warped events at a randomly chosen reference time $t_{\text{ref}}$, while regularization enforces spatial-smoothness and robustness across training references. Empirically, the approach yields a ~29% improvement in zero-shot EVIMO2 performance after synthetic pretraining and achieves state-of-the-art self-supervised results on the DSEC optical flow benchmark, with ~5x faster inference than baselines, demonstrating substantial practical impact for real-time, dense event-based motion estimation. Overall, the method generalizes across architectures and motion priors, reducing the reliance on GT while delivering accurate, continuous-time motion estimates suitable for robotics and vision tasks.
Abstract
Current optical flow and point-tracking methods rely heavily on synthetic datasets. Event cameras are novel vision sensors with advantages in challenging visual conditions, but state-of-the-art frame-based methods cannot be easily adapted to event data due to the limitations of current event simulators. We introduce a novel self-supervised loss combining the Contrast Maximization framework with a non-linear motion prior in the form of pixel-level trajectories and propose an efficient solution to solve the high-dimensional assignment problem between non-linear trajectories and events. Their effectiveness is demonstrated in two scenarios: In dense continuous-time motion estimation, our method improves the zero-shot performance of a synthetically trained model on the real-world dataset EVIMO2 by 29%. In optical flow estimation, our method elevates a simple UNet to achieve state-of-the-art performance among self-supervised methods on the DSEC optical flow benchmark. Our code is available at https://github.com/tub-rip/MotionPriorCMax.
