Motion Perceiver: Real-Time Occupancy Forecasting for Embedded Systems
Bryce Ferenczi, Michael Burke, Tom Drummond
TL;DR
The paper tackles real-time occupancy forecasting for autonomous systems under a streaming sensor paradigm. It introduces MotionPerceiver, a transformer-based latent-state model that evolves over time with a learned time evolution and updates its state using cross- and self-attention as new observations arrive. Key contributions include a data-streaming architecture that avoids per-agent tracking, achieves competitive AUC and superior Soft IoU on the Waymo Open Motion Dataset, and supports localized occupancy queries suitable for downstream planning. The approach demonstrates strong edge inference capabilities on devices like the Nvidia Xavier AGX, robustness to occlusions, and potential extensions to ego-action conditioning and multi-trajectory planning for practical deployment.
Abstract
This work introduces a novel and adaptable architecture designed for real-time occupancy forecasting that outperforms existing state-of-the-art models on the Waymo Open Motion Dataset in Soft IOU. The proposed model uses recursive latent state estimation with learned transformer-based functions to effectively update and evolve the state. This enables highly efficient real-time inference on embedded systems, as profiled on an Nvidia Xavier AGX. Our model, MotionPerceiver, achieves this by encoding a scene into a latent state that evolves in time through self-attention mechanisms. Additionally, it incorporates relevant scene observations, such as traffic signals, road topology and agent detections, through cross-attention mechanisms. This forms an efficient data-streaming architecture, that contrasts with the expensive, fixed-sequence input common in existing models. The architecture also offers the distinct advantage of generating occupancy predictions through localized querying based on a point-of-interest, as opposed to generating fixed-size occupancy images that render potentially irrelevant regions.
