Table of Contents
Fetching ...

Aerial View River Landform Video segmentation: A Weakly Supervised Context-aware Temporal Consistency Distillation Approach

Chi-Han Chen, Chieh-Ming Chen, Wen-Huang Cheng, Ching-Chun Huang

TL;DR

The paper tackles river landform segmentation from UAV video under limited labeled data, focusing on temporal consistency. It introduces a teacher–student framework that leverages memory-based Video Object Segmentation and a SSIM-guided key-frame strategy with a key-frame update mechanism to distill temporal knowledge under weak supervision. A combined loss balances per-frame segmentation accuracy with temporal coherence, enabling competitive mIoU and high temporal stability with as little as 30% labeled data. The approach improves robust localization of sediment and exposed ground in river monitoring, reducing annotation burden while maintaining temporal reliability.

Abstract

The study of terrain and landform classification through UAV remote sensing diverges significantly from ground vehicle patrol tasks. Besides grappling with the complexity of data annotation and ensuring temporal consistency, it also confronts the scarcity of relevant data and the limitations imposed by the effective range of many technologies. This research substantiates that, in aerial positioning tasks, both the mean Intersection over Union (mIoU) and temporal consistency (TC) metrics are of paramount importance. It is demonstrated that fully labeled data is not the optimal choice, as selecting only key data lacks the enhancement in TC, leading to failures. Hence, a teacher-student architecture, coupled with key frame selection and key frame updating algorithms, is proposed. This framework successfully performs weakly supervised learning and TC knowledge distillation, overcoming the deficiencies of traditional TC training in aerial tasks. The experimental results reveal that our method utilizing merely 30\% of labeled data, concurrently elevates mIoU and temporal consistency ensuring stable localization of terrain objects. Result demo : https://gitlab.com/prophet.ai.inc/drone-based-riverbed-inspection

Aerial View River Landform Video segmentation: A Weakly Supervised Context-aware Temporal Consistency Distillation Approach

TL;DR

The paper tackles river landform segmentation from UAV video under limited labeled data, focusing on temporal consistency. It introduces a teacher–student framework that leverages memory-based Video Object Segmentation and a SSIM-guided key-frame strategy with a key-frame update mechanism to distill temporal knowledge under weak supervision. A combined loss balances per-frame segmentation accuracy with temporal coherence, enabling competitive mIoU and high temporal stability with as little as 30% labeled data. The approach improves robust localization of sediment and exposed ground in river monitoring, reducing annotation burden while maintaining temporal reliability.

Abstract

The study of terrain and landform classification through UAV remote sensing diverges significantly from ground vehicle patrol tasks. Besides grappling with the complexity of data annotation and ensuring temporal consistency, it also confronts the scarcity of relevant data and the limitations imposed by the effective range of many technologies. This research substantiates that, in aerial positioning tasks, both the mean Intersection over Union (mIoU) and temporal consistency (TC) metrics are of paramount importance. It is demonstrated that fully labeled data is not the optimal choice, as selecting only key data lacks the enhancement in TC, leading to failures. Hence, a teacher-student architecture, coupled with key frame selection and key frame updating algorithms, is proposed. This framework successfully performs weakly supervised learning and TC knowledge distillation, overcoming the deficiencies of traditional TC training in aerial tasks. The experimental results reveal that our method utilizing merely 30\% of labeled data, concurrently elevates mIoU and temporal consistency ensuring stable localization of terrain objects. Result demo : https://gitlab.com/prophet.ai.inc/drone-based-riverbed-inspection

Paper Structure

This paper contains 15 sections, 6 equations, 4 figures, 2 tables, 2 algorithms.

Figures (4)

  • Figure 1: This long-term video dataset is captured during the dry season of the Dajia River and Da'an River in Taiwan. To accurately locate large areas of sediment and exposed ground after segmentation, the use of segmentation techniques becomes essential, thereby giving rise to the need for applications that involve temporal consistency and weakly supervised learning.
  • Figure 2: The main architecture of this study includes a TeacherNet and a StudentNet. The Teacher transfers temporally consistent knowledge to the student through a Pseudo Label Bank. The addition of Key frame selection aims to achieve weakly supervised learning. Key frame updates are based on temporal consistency, updating stable results into the key frame set, ensuring that frame t-1 is included in the key frames in subsequent iterations.
  • Figure 3: This fighure illustrates the temporal consistency of four models tested on a set of 158 validation images. In this depiction, the model developed under our architecture demonstrates better and more stable performance in comparison to the others.
  • Figure 4: This figure demonstrates that using the entire labeled dataset for training does not necessarily enhance the accuracy of segmentation due to potential redundant training.The red box highlights the importance of TC in localization tasks. Our optimized method, however, is capable of improving both accuracy and temporal consistency under weakly supervised conditions.