Table of Contents
Fetching ...

Local-Global Temporal Difference Learning for Satellite Video Super-Resolution

Yi Xiao, Qiangqiang Yuan, Kui Jiang, Xianyu Jin, Jiang He, Liangpei Zhang, Chia-Wen Lin

TL;DR

This paper tackles satellite video super-resolution by exploiting explicit temporal differences to compensate motion, addressing limitations of optical-flow and heavy kernel-based approaches in large-scale remote sensing data. The authors introduce LGTD, a two-branch framework consisting of Short-term Temporal Difference Module (S-TDM) for local motion cues and Long-term Temporal Difference Module (L-TDM) for global motion cues, augmented by a Difference Compensation Unit (DCU) to preserve spatial consistency; reconstruction uses a hybrid attention mechanism. The method demonstrates superior quantitative performance across five satellite datasets and maintains efficiency by using a relatively small temporal window and deformation-based alignment, outperforming flow-based and kernel-based methods while approaching or exceeding recurrent approaches. These results suggest a practical, computation-friendly alternative for temporal compensation in satellite VSR, with broad implications for remote sensing tasks requiring high-resolution video data. Future work includes designing lighter attention blocks to further reduce model size without sacrificing temporal fidelity.

Abstract

Optical-flow-based and kernel-based approaches have been extensively explored for temporal compensation in satellite Video Super-Resolution (VSR). However, these techniques are less generalized in large-scale or complex scenarios, especially in satellite videos. In this paper, we propose to exploit the well-defined temporal difference for efficient and effective temporal compensation. To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies since we observed that these discrepancies offer distinct and mutually complementary properties. Specifically, we devise a Short-term Temporal Difference Module (S-TDM) to extract local motion representations from RGB difference maps between adjacent frames, which yields more clues for accurate texture representation. To explore the global dependency in the entire frame sequence, a Long-term Temporal Difference Module (L-TDM) is proposed, where the differences between forward and backward segments are incorporated and activated to guide the modulation of the temporal feature, leading to a holistic global compensation. Moreover, we further propose a Difference Compensation Unit (DCU) to enrich the interaction between the spatial distribution of the target frame and temporal compensated results, which helps maintain spatial consistency while refining the features to avoid misalignment. Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches. Code will be available at https://github.com/XY-boy/LGTD

Local-Global Temporal Difference Learning for Satellite Video Super-Resolution

TL;DR

This paper tackles satellite video super-resolution by exploiting explicit temporal differences to compensate motion, addressing limitations of optical-flow and heavy kernel-based approaches in large-scale remote sensing data. The authors introduce LGTD, a two-branch framework consisting of Short-term Temporal Difference Module (S-TDM) for local motion cues and Long-term Temporal Difference Module (L-TDM) for global motion cues, augmented by a Difference Compensation Unit (DCU) to preserve spatial consistency; reconstruction uses a hybrid attention mechanism. The method demonstrates superior quantitative performance across five satellite datasets and maintains efficiency by using a relatively small temporal window and deformation-based alignment, outperforming flow-based and kernel-based methods while approaching or exceeding recurrent approaches. These results suggest a practical, computation-friendly alternative for temporal compensation in satellite VSR, with broad implications for remote sensing tasks requiring high-resolution video data. Future work includes designing lighter attention blocks to further reduce model size without sacrificing temporal fidelity.

Abstract

Optical-flow-based and kernel-based approaches have been extensively explored for temporal compensation in satellite Video Super-Resolution (VSR). However, these techniques are less generalized in large-scale or complex scenarios, especially in satellite videos. In this paper, we propose to exploit the well-defined temporal difference for efficient and effective temporal compensation. To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies since we observed that these discrepancies offer distinct and mutually complementary properties. Specifically, we devise a Short-term Temporal Difference Module (S-TDM) to extract local motion representations from RGB difference maps between adjacent frames, which yields more clues for accurate texture representation. To explore the global dependency in the entire frame sequence, a Long-term Temporal Difference Module (L-TDM) is proposed, where the differences between forward and backward segments are incorporated and activated to guide the modulation of the temporal feature, leading to a holistic global compensation. Moreover, we further propose a Difference Compensation Unit (DCU) to enrich the interaction between the spatial distribution of the target frame and temporal compensated results, which helps maintain spatial consistency while refining the features to avoid misalignment. Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches. Code will be available at https://github.com/XY-boy/LGTD
Paper Structure (31 sections, 16 equations, 10 figures, 8 tables)

This paper contains 31 sections, 16 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Comparison between RGB temporal differences and optical flow maps. The optical flow maps were generated by PyFlow liu2009beyond. Temporal differences activate more accurate and sharp cues than optical flows. Besides, the local temporal difference and global temporal difference are not equally informative as they reflect a different level of difference.
  • Figure 2: The overall structure of our proposed Local-Global Temporal Difference learning network (LGTD). It consists of four modules: (1) Short-term Temporal Difference Module (S-TDM), which is used for local temporal compensation; (2) Long-term Temporal Difference Module (L-TDM), which is proposed to realize global temporal compensation; (3) Difference Compensation Unit (DCU), which is utilized for integrating the spatial and temporal information to maintain the spatial consistency; (4) Reconstruction module, which is employed to generate the final HR target frame. Short-term Attention (SA) Block is equipped by Channel Attention, and Multi-head Self-Attention realizes Long-term (LA) Block.
  • Figure 3: The diagram of our proposed Short-term Temporal Difference Module (S-TDM) is shown, with $N=2$ taken as an example. S-TDM performs feature extraction on stacked RGB difference maps and supplies the local motion representations into $f_t$ for local compensation.
  • Figure 4: Qualitative comparisons on scene-2 of Jilin-1, scene-8 of Carbonite-2, and scene-11 from UrtheCast. Zoom in for better visualization.
  • Figure 5: Qualitative comparisons on scene-14 and scene-15 of SkySat-1. Zoom in for better visualization.
  • ...and 5 more figures