MemFlow: Optical Flow Estimation and Prediction with Memory
Qiaole Dong, Yanwei Fu
TL;DR
MemFlow presents a memory-augmented online optical flow framework that reads and updates a history-aware memory to leverage temporal coherence without offline multi-frame requirements. It introduces memory read-out with attention and a resolution-adaptive scaling, enabling strong cross-dataset generalization and real-time performance, while also extending to one-step-ahead flow prediction (MemFlow-P) for video synthesis workflows. The approach achieves state-of-the-art or near-SOTA results with fewer parameters and faster inference compared to heavy multi-frame methods, and demonstrates competitive flow prediction and video prediction without task-specific training. This work offers a practical, memory-driven solution for real-time optical flow and predictive motion modeling in safety-critical applications.
Abstract
Optical flow is a classical task that is important to the vision community. Classical optical flow estimation uses two frames as input, whilst some recent methods consider multiple frames to explicitly model long-range information. The former ones limit their ability to fully leverage temporal coherence along the video sequence; and the latter ones incur heavy computational overhead, typically not possible for real-time flow estimation. Some multi-frame-based approaches even necessitate unseen future frames for current estimation, compromising real-time applicability in safety-critical scenarios. To this end, we present MemFlow, a real-time method for optical flow estimation and prediction with memory. Our method enables memory read-out and update modules for aggregating historical motion information in real-time. Furthermore, we integrate resolution-adaptive re-scaling to accommodate diverse video resolutions. Besides, our approach seamlessly extends to the future prediction of optical flow based on past observations. Leveraging effective historical motion aggregation, our method outperforms VideoFlow with fewer parameters and faster inference speed on Sintel and KITTI-15 datasets in terms of generalization performance. At the time of submission, MemFlow also leads in performance on the 1080p Spring dataset. Codes and models will be available at: https://dqiaole.github.io/MemFlow/.
