Table of Contents
Fetching ...

Breaking of brightness consistency in optical flow with a lightweight CNN network

Yicheng Lin, Shuo Wang, Yunlong Jiang, Bin Han

TL;DR

This work tackles the problem of brightness inconsistencies hindering sparse optical flow in HDR environments. It introduces a light-weight CNN that yields illumination-invariant feature maps and a keypoint score map, which are integrated with a pyramid LK flow to form a light-robust hybrid optical flow trained with unsupervised losses. Key contributions include the mNRE-based illumination-invariant feature loss, the line peaky keypoint loss, and a two-network training regime that preserves real-time CPU performance while improving illumination robustness, demonstrated on HDR datasets and within VIO. The method enables more reliable sparse-flow-based SLAM in challenging settings such as caves or tunnels, with practical impact for real-time robotics and vision-based navigation.

Abstract

Sparse optical flow is widely used in various computer vision tasks, however assuming brightness consistency limits its performance in High Dynamic Range (HDR) environments. In this work, a lightweight network is used to extract illumination robust convolutional features and corners with strong invariance. Modifying the typical brightness consistency of the optical flow method to the convolutional feature consistency yields the light-robust hybrid optical flow method. The proposed network runs at 190 FPS on a commercial CPU because it uses only four convolutional layers to extract feature maps and score maps simultaneously. Since the shallow network is difficult to train directly, a deep network is designed to compute the reliability map that helps it. An end-to-end unsupervised training mode is used for both networks. To validate the proposed method, we compare corner repeatability and matching performance with origin optical flow under dynamic illumination. In addition, a more accurate visual inertial system is constructed by replacing the optical flow method in VINS-Mono. In a public HDR dataset, it reduces translation errors by 93\%. The code is publicly available at https://github.com/linyicheng1/LET-NET.

Breaking of brightness consistency in optical flow with a lightweight CNN network

TL;DR

This work tackles the problem of brightness inconsistencies hindering sparse optical flow in HDR environments. It introduces a light-weight CNN that yields illumination-invariant feature maps and a keypoint score map, which are integrated with a pyramid LK flow to form a light-robust hybrid optical flow trained with unsupervised losses. Key contributions include the mNRE-based illumination-invariant feature loss, the line peaky keypoint loss, and a two-network training regime that preserves real-time CPU performance while improving illumination robustness, demonstrated on HDR datasets and within VIO. The method enables more reliable sparse-flow-based SLAM in challenging settings such as caves or tunnels, with practical impact for real-time robotics and vision-based navigation.

Abstract

Sparse optical flow is widely used in various computer vision tasks, however assuming brightness consistency limits its performance in High Dynamic Range (HDR) environments. In this work, a lightweight network is used to extract illumination robust convolutional features and corners with strong invariance. Modifying the typical brightness consistency of the optical flow method to the convolutional feature consistency yields the light-robust hybrid optical flow method. The proposed network runs at 190 FPS on a commercial CPU because it uses only four convolutional layers to extract feature maps and score maps simultaneously. Since the shallow network is difficult to train directly, a deep network is designed to compute the reliability map that helps it. An end-to-end unsupervised training mode is used for both networks. To validate the proposed method, we compare corner repeatability and matching performance with origin optical flow under dynamic illumination. In addition, a more accurate visual inertial system is constructed by replacing the optical flow method in VINS-Mono. In a public HDR dataset, it reduces translation errors by 93\%. The code is publicly available at https://github.com/linyicheng1/LET-NET.
Paper Structure (24 sections, 19 equations, 8 figures, 6 tables)

This paper contains 24 sections, 19 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Examples of dynamic lighting scene images. We collected images under different directions of light to demonstrate the robustness of the proposed method to illumination. Among them, forward optical flow refers to extracting keypoints in the first image and tracking them in the second image. The backward optical flow is the opposite.
  • Figure 2: The pipeline of the proposed hybrid optical flow method. A shared encoder is first used to extract the shared feature map of the image, and then the shared feature map are decoded into score map $S$ and illumination-invariant feature map $F$. The score map $S$ is utilized for extracting keypoints, employing non-maximum suppression (NMS) to identify them. The illumination-invariant feature map $F$ are used to construct the pyramid optical flow method. Following the pyramid LK optical flow method, the hybrid optical flow method begins tracking the extracted keypoints from the highest level of the feature pyramid. The feature map are utilized to locate the positions of these keypoints in another image. Subsequently, the tracking results from the upper level are utilized as initial values for the tracking computation in the lower level, ultimately yielding sparse optical flow results.
  • Figure 3: The network training process. A shallow network is first used to extract the score map $\mathbf{S}$ and feature map $\mathbf{F}$. Then, in order to supervise the reliability of the training keypoints, a deep network is used to extract the dense descriptor map $\mathbf{D}$. Finally, we calculate the keypoint loss, feature loss and descriptor loss based on the results of $[\mathbf{S},\mathbf{F},\mathbf{D}]$. Only the shallow network was used for the hybrid optical flow method depicted in Fig. 2, while the deep network was only used for training.
  • Figure 4: Comparison of line peaky loss and peaky loss. In $5 \times 5$ sized patch, the cyan block represents score 0.5 while the red represents score 1. The numbers in the blocks represent then the derivative of the different losses with respect to the block. It can be seen that the line peaky loss increases the penalty weight for the ends of the lines.
  • Figure 5: Optical flow method comparison in image sequences. In the image sequence, an active light source is used to simulate dynamic lighting environments. At the edge of the elliptical spot, the constant brightness assumption no longer holds, so the performance of the optical flow method will be challenged. The current keypoints are drawn in green, and the optical flow results within ten frames is drawn in red. Image pairs in the sequence are drawn every ten frames. It can be seen that the traditional LK optical flow method has a large error at the edge of the spot. The proposed optical flow method has significantly improved performance at the edge of the spot.
  • ...and 3 more figures