Table of Contents
Fetching ...

Highly Efficient and Unsupervised Framework for Moving Object Detection in Satellite Videos

C. Xiao, W. An, Y. Zhang, Z. Su, M. Li, W. Sheng, M. Pietikäinen, L. Liu

TL;DR

This paper proposes a generic unsupervised framework for SVMOD, in which pseudo labels generated by a traditional method can evolve with the training process to promote detection performance, and proposes a highly efficient and effective sparse convolutional anchor-free detection network.

Abstract

Moving object detection in satellite videos (SVMOD) is a challenging task due to the extremely dim and small target characteristics. Current learning-based methods extract spatio-temporal information from multi-frame dense representation with labor-intensive manual labels to tackle SVMOD, which needs high annotation costs and contains tremendous computational redundancy due to the severe imbalance between foreground and background regions. In this paper, we propose a highly efficient unsupervised framework for SVMOD. Specifically, we propose a generic unsupervised framework for SVMOD, in which pseudo labels generated by a traditional method can evolve with the training process to promote detection performance. Furthermore, we propose a highly efficient and effective sparse convolutional anchor-free detection network by sampling the dense multi-frame image form into a sparse spatio-temporal point cloud representation and skipping the redundant computation on background regions. Coping these two designs, we can achieve both high efficiency (label and computation efficiency) and effectiveness. Extensive experiments demonstrate that our method can not only process 98.8 frames per second on 1024x1024 images but also achieve state-of-the-art performance. The relabeled dataset and code are available at https://github.com/ChaoXiao12/Moving-object-detection-in-satellite-videos-HiEUM.

Highly Efficient and Unsupervised Framework for Moving Object Detection in Satellite Videos

TL;DR

This paper proposes a generic unsupervised framework for SVMOD, in which pseudo labels generated by a traditional method can evolve with the training process to promote detection performance, and proposes a highly efficient and effective sparse convolutional anchor-free detection network.

Abstract

Moving object detection in satellite videos (SVMOD) is a challenging task due to the extremely dim and small target characteristics. Current learning-based methods extract spatio-temporal information from multi-frame dense representation with labor-intensive manual labels to tackle SVMOD, which needs high annotation costs and contains tremendous computational redundancy due to the severe imbalance between foreground and background regions. In this paper, we propose a highly efficient unsupervised framework for SVMOD. Specifically, we propose a generic unsupervised framework for SVMOD, in which pseudo labels generated by a traditional method can evolve with the training process to promote detection performance. Furthermore, we propose a highly efficient and effective sparse convolutional anchor-free detection network by sampling the dense multi-frame image form into a sparse spatio-temporal point cloud representation and skipping the redundant computation on background regions. Coping these two designs, we can achieve both high efficiency (label and computation efficiency) and effectiveness. Extensive experiments demonstrate that our method can not only process 98.8 frames per second on 1024x1024 images but also achieve state-of-the-art performance. The relabeled dataset and code are available at https://github.com/ChaoXiao12/Moving-object-detection-in-satellite-videos-HiEUM.

Paper Structure

This paper contains 14 sections, 1 equation, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Comparison of the detection performance (F1 score) and computational efficiency (Frame Per Second, FPS) of different methods on the re-labeled VISO dataset. Our proposed unsupervised method is highly efficient and effective compared with other methods, including RPCA-based (GoDec Godec, DECOLOR Zhou2013Decolor, E-LSD zhang2019error and B-MCMD Zhang2021MovingVD), differencing-based (D&T ao2019needles and MMB yin2021detecting) and learning-based methods (ClusterNet lalonde2018clusternet, DSFNet xiao2021dsfnet, and DeepPrior xiao2023incorporating). Specifically, compared with the traditional method B-MCMD Zhang2021MovingVD and learning-based method DSFNet xiao2021dsfnet, our method speeds up 4490x and 28.7x with a significant improvement on F1 score of 30.6% and 15.3%, respectively. All learning-based methods are conducted on a single RTX2080Ti GPU.
  • Figure 2: An illustration of our proposed unsupervised framework. (a) The overall architecture of the proposed iterative updating unsupervised framework. (b) The proposed sparse convolutional anchor-free moving object detection network. Firstly, we utilize the traditional method modified from xiao2023incorporating and SORT bewley2016simple to obtain initial pseudo labels. Then, the initial labels and videos are exploited to train the sparse detection network. The input video clip is processed by a sparse sampling module to extract sparse 3D spatial-temporal point cloud data. Finally, the trained sparse network is used to update the pseudo labels and retrained to promote the detection performance. Note that the method generating initial labels and the detection network can be replaced by an arbitrary method, demonstrating the flexibility of our proposed framework.
  • Figure 3: The average target ratio of the test set. The average ratio of the target in each video is less than 0.3%, which demonstrates the sparsity of the moving vehicle in satellite videos.
  • Figure 4: The illustration of the proposed sparse sampling module.
  • Figure 5: An illustration of sparse convolution. Compared with regular convolution, sparse convolution only calculates on valid positions, i.e. the blue square.
  • ...and 3 more figures