Table of Contents
Fetching ...

CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices

Andrei Znobishchev, Valerii Filev, Oleg Kudashev, Nikita Orlov, Humphrey Shi

TL;DR

CompactFlowNet introduces a mobile-optimized dense optical flow model with a redesigned architecture, a lightweight MobileNetV3 backbone, and a distillation pipeline to preserve accuracy. The method emphasizes on-device profiling, sequential flow estimation, and depthwise separable convolutions to achieve real-time inference with low memory on common smartphones. On-device experiments across multiple iOS devices demonstrate real-time performance at practical resolutions and substantially reduced memory usage compared to prior lightweight methods. The work enables practical mobile deployment of flow-based video tasks and establishes a benchmark for efficient on-device dense motion estimation.

Abstract

We present CompactFlowNet, the first real-time mobile neural network for optical flow prediction, which involves determining the displacement of each pixel in an initial frame relative to the corresponding pixel in a subsequent frame. Optical flow serves as a fundamental building block for various video-related tasks, such as video restoration, motion estimation, video stabilization, object tracking, action recognition, and video generation. While current state-of-the-art methods prioritize accuracy, they often overlook constraints regarding speed and memory usage. Existing light models typically focus on reducing size but still exhibit high latency, compromise significantly on quality, or are optimized for high-performance GPUs, resulting in sub-optimal performance on mobile devices. This study aims to develop a mobile-optimized optical flow model by proposing a novel mobile device-compatible architecture, as well as enhancements to the training pipeline, which optimize the model for reduced weight, low memory utilization, and increased speed while maintaining minimal error. Our approach demonstrates superior or comparable performance to the state-of-the-art lightweight models on the challenging KITTI and Sintel benchmarks. Furthermore, it attains a significantly accelerated inference speed, thereby yielding real-time operational efficiency on the iPhone 8, while surpassing real-time performance levels on more advanced mobile devices.

CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices

TL;DR

CompactFlowNet introduces a mobile-optimized dense optical flow model with a redesigned architecture, a lightweight MobileNetV3 backbone, and a distillation pipeline to preserve accuracy. The method emphasizes on-device profiling, sequential flow estimation, and depthwise separable convolutions to achieve real-time inference with low memory on common smartphones. On-device experiments across multiple iOS devices demonstrate real-time performance at practical resolutions and substantially reduced memory usage compared to prior lightweight methods. The work enables practical mobile deployment of flow-based video tasks and establishes a benchmark for efficient on-device dense motion estimation.

Abstract

We present CompactFlowNet, the first real-time mobile neural network for optical flow prediction, which involves determining the displacement of each pixel in an initial frame relative to the corresponding pixel in a subsequent frame. Optical flow serves as a fundamental building block for various video-related tasks, such as video restoration, motion estimation, video stabilization, object tracking, action recognition, and video generation. While current state-of-the-art methods prioritize accuracy, they often overlook constraints regarding speed and memory usage. Existing light models typically focus on reducing size but still exhibit high latency, compromise significantly on quality, or are optimized for high-performance GPUs, resulting in sub-optimal performance on mobile devices. This study aims to develop a mobile-optimized optical flow model by proposing a novel mobile device-compatible architecture, as well as enhancements to the training pipeline, which optimize the model for reduced weight, low memory utilization, and increased speed while maintaining minimal error. Our approach demonstrates superior or comparable performance to the state-of-the-art lightweight models on the challenging KITTI and Sintel benchmarks. Furthermore, it attains a significantly accelerated inference speed, thereby yielding real-time operational efficiency on the iPhone 8, while surpassing real-time performance levels on more advanced mobile devices.

Paper Structure

This paper contains 16 sections, 1 equation, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Comparison of flow estimator structures: densely connected (left) and sequentially connected (right). The indicated number of input and output channels corresponds to the flow estimator block operating at the lowest resolution.
  • Figure 2: Visualized results on Sintel Clean test set.
  • Figure 3: Visualized results on Sintel Final test set.
  • Figure 4: Visualized results on KITTI 2015 test set.