Table of Contents
Fetching ...

NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices

Zhiyong Zhang, Huaizu Jiang, Hanumant Singh

TL;DR

NeuFlow tackles real-time, high-accuracy optical flow on edge devices by combining a lightweight multi-scale CNN backbone with global cross-attention at $1/16$ resolution and a local refinement stage at $1/8$, followed by convex upsampling. It achieves comparable accuracy to leading methods like RAFT, GMFlow, and FlowFormer while delivering 10×–70× higher throughput on GPUs and around 30 FPS on Jetson Orin Nano, enabling real-time deployment in SLAM and visual odometry. Trained on FlyingChairs and FlyingThings and evaluated on FlyingThings and Sintel, NeuFlow demonstrates strong cross-domain generalization and practical applicability to robotic workflows. The authors also provide open-source code and weights to facilitate adoption and further research in edge-based optical flow for SWaP-C platforms.

Abstract

Real-time high-accuracy optical flow estimation is a crucial component in various applications, including localization and mapping in robotics, object tracking, and activity recognition in computer vision. While recent learning-based optical flow methods have achieved high accuracy, they often come with heavy computation costs. In this paper, we propose a highly efficient optical flow architecture, called NeuFlow, that addresses both high accuracy and computational cost concerns. The architecture follows a global-to-local scheme. Given the features of the input images extracted at different spatial resolutions, global matching is employed to estimate an initial optical flow on the 1/16 resolution, capturing large displacement, which is then refined on the 1/8 resolution with lightweight CNN layers for better accuracy. We evaluate our approach on Jetson Orin Nano and RTX 2080 to demonstrate efficiency improvements across different computing platforms. We achieve a notable 10x-80x speedup compared to several state-of-the-art methods, while maintaining comparable accuracy. Our approach achieves around 30 FPS on edge computing platforms, which represents a significant breakthrough in deploying complex computer vision tasks such as SLAM on small robots like drones. The full training and evaluation code is available at https://github.com/neufieldrobotics/NeuFlow.

NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices

TL;DR

NeuFlow tackles real-time, high-accuracy optical flow on edge devices by combining a lightweight multi-scale CNN backbone with global cross-attention at resolution and a local refinement stage at , followed by convex upsampling. It achieves comparable accuracy to leading methods like RAFT, GMFlow, and FlowFormer while delivering 10×–70× higher throughput on GPUs and around 30 FPS on Jetson Orin Nano, enabling real-time deployment in SLAM and visual odometry. Trained on FlyingChairs and FlyingThings and evaluated on FlyingThings and Sintel, NeuFlow demonstrates strong cross-domain generalization and practical applicability to robotic workflows. The authors also provide open-source code and weights to facilitate adoption and further research in edge-based optical flow for SWaP-C platforms.

Abstract

Real-time high-accuracy optical flow estimation is a crucial component in various applications, including localization and mapping in robotics, object tracking, and activity recognition in computer vision. While recent learning-based optical flow methods have achieved high accuracy, they often come with heavy computation costs. In this paper, we propose a highly efficient optical flow architecture, called NeuFlow, that addresses both high accuracy and computational cost concerns. The architecture follows a global-to-local scheme. Given the features of the input images extracted at different spatial resolutions, global matching is employed to estimate an initial optical flow on the 1/16 resolution, capturing large displacement, which is then refined on the 1/8 resolution with lightweight CNN layers for better accuracy. We evaluate our approach on Jetson Orin Nano and RTX 2080 to demonstrate efficiency improvements across different computing platforms. We achieve a notable 10x-80x speedup compared to several state-of-the-art methods, while maintaining comparable accuracy. Our approach achieves around 30 FPS on edge computing platforms, which represents a significant breakthrough in deploying complex computer vision tasks such as SLAM on small robots like drones. The full training and evaluation code is available at https://github.com/neufieldrobotics/NeuFlow.
Paper Structure (16 sections, 5 figures, 5 tables)

This paper contains 16 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: End point error (EPE) v.s. frame per second (FPS) throughput on a common computing platform (Nvidia RTX 2080). Individual points represents a broad class of optical flow methods. Our algorithm is comparable in accuracy but significantly better (close to an order of magnitude) in terms of its computational complexity. All models trained solely on FlyingThings and FlyingChairs.
  • Figure 2: Optical flow results of NeuFlow: on the left is a result from the standard KITTI dataset. On the right are results from a UAS flight overlow-contrast glacier images in the Arctic. Our approach is notable for both computational efficiency and speed as well as accuracy, as shown in Fig. \ref{['epe_fps_1_1']}.
  • Figure 3: NeuFlow Architecture: We begins with a shallow CNN backbone. The backbone outputs feature vectors at 1/8 and 1/16 scale for both images. The feature vectors at 1/16 scale are then fed into two cross-attention layers for global matching. The resulting flow is passed into a self-attention layer for flow propagation based on feature self-similarity. Subsequently, the flow is upsampled to obtain 1/8 resolution flow. We wrap the 1/8 features with the flow and perform local refinement within a 7x7 window. The refined 1/8 flow is then upsampled to obtain full-resolution flow using a convex upsampling module, which additionally requires 1/8 features from image one.
  • Figure 4: NeuFlow Shallow CNN Backbone: Initially, we downsample the image into different scales, ranging from 1/1 scale to 1/16 scale. Subsequently, we extract features using a CNN block. The feature vectors at 1/1, 1/2, 1/4, and 1/8 scales are concatenated into a single 1/8 feature vector. Then, another CNN block is employed to merge the 1/8 feature vector with the 1/16 feature vector, resulting in a 1/16 feature vector. The 1/16 feature vector is utilized for global attention, while the 1/8 feature vector is employed for local refinement. The CNN block consists solely of two CNN layers along with activation functions and normalization layers. The kernel size and stride of the CNN layers depend on the input and output dimensions upstream and downstream of the network. An additional 1/8 feature is extracted from the full-resolution image to perform convex upsampling.
  • Figure 5: End point error (EPE) v.s. frame per second (FPS) on Nvidia RTX 2080 while outputting 1/8 resolution flow. All models trained solely on FlyingThings and FlyingChairs. NeuFlow is optimized for accuracy and efficiency at 1/8 resolution, thus we gain more advantage compared to full resolution flow.