Masked Angle-Aware Autoencoder for Remote Sensing Images

Zhihao Li; Biao Hou; Siteng Ma; Zitong Wu; Xianpeng Guo; Bo Ren; Licheng Jiao

Masked Angle-Aware Autoencoder for Remote Sensing Images

Zhihao Li, Biao Hou, Siteng Ma, Zitong Wu, Xianpeng Guo, Bo Ren, Licheng Jiao

TL;DR

The Masked Angle-Aware Autoencoder (MA3E) is proposed to perceive and learn angles during pre-training to effectively learn rotation-invariant representations by restoring the angle variation introduced on the rotated crop.

Abstract

To overcome the inherent domain gap between remote sensing (RS) images and natural images, some self-supervised representation learning methods have made promising progress. However, they have overlooked the diverse angles present in RS objects. This paper proposes the Masked Angle-Aware Autoencoder (MA3E) to perceive and learn angles during pre-training. We design a \textit{scaling center crop} operation to create the rotated crop with random orientation on each original image, introducing the explicit angle variation. MA3E inputs this composite image while reconstruct the original image, aiming to effectively learn rotation-invariant representations by restoring the angle variation introduced on the rotated crop. To avoid biases caused by directly reconstructing the rotated crop, we propose an Optimal Transport (OT) loss that automatically assigns similar original image patches to each rotated crop patch for reconstruction. MA3E demonstrates more competitive performance than existing pre-training methods on seven different RS image datasets in three downstream tasks.

Masked Angle-Aware Autoencoder for Remote Sensing Images

TL;DR

Abstract

Paper Structure (18 sections, 5 equations, 10 figures, 17 tables)

This paper contains 18 sections, 5 equations, 10 figures, 17 tables.

Introduction
Related Works
Method
Preliminary: MAE
Masked Angle-Aware Autoencoder (MA3E)
Reconstruction for Rotated Crop
Experiments
Experimental Setups
Main Results
Ablation Study
Visualization
Conclusion and Discussion
Full Implementation Details
Experimental Setups
Dataset Preparations
...and 3 more sections

Figures (10)

Figure 1: Detection results of the detector loaded with MA3E and MAE wang2022advancing pre-trained models for RS objects categorized into different angle ranges. The fine-tuning experimental setup is the same as described in Sec. \ref{['sec4.1']}. Our MA3E models, pre-trained for 300 epochs and 1600 epochs, notably enhance AP$^{50}$ for objects with angles ranging from 10$^\circ$ to 80$^\circ$, demonstrating the effectiveness of angle perception during pre-training. † denotes our reproduction, as Wanget al.wang2022advancing only releases the model pre-trained for 1600 epochs using MAE he2022masked on an RS image dataset.
Figure 2: (a) The pipeline of MA3E. A scaling center crop operations is designed to create the rotated crop within the original image, introducing an explicit angle variation. An angle embedding is added to the rotated crop, followed by random masking the rotated crop along with the remaining background respectively. Then, all visible patches undergo sequential encoding and decoding to reconstruct the original image and restore the preset angle variation. (b) MA3E treats the reconstruction for rotated crops as an OT problem. By leveraging the Sinkhorn-Knopp fast iterative algorithm cuturi2013sinkhorn to solve the transportation plan $\Omega$, an OT loss is proposed. OT loss automatically assigns similar image patches for each predicted patch of the rotated crop for reconstruction.
Figure 3: (a) The proposed scaling center crop constructs the rotated crop with a random angle at arbitrary position in the original image, introducing the explicit angle variation. (b) In the left and middle columns, the simple random rotation operation results in i) meaningless background with zero values; ii) loss of the scene; iii) changes in scene scale. In the right column, fixed angles (e.g., 90°, 180°, 270°) for rotation restrict the diversity of scenes.
Figure 4: Example results on MillionAID training images. For each set, we display the original image (left one), the composite image containing the rotated crop (left two), the masked image (right two), and the MA3E reconstructed image (right one). To aid observation, the rotated crop is highlighted with a red box. Following MAE he2022masked, we show the model's output on visible patches to comprehensively demonstrate the reconstruction quality of MA3E.
Figure 5: Example results on MillionAID training images at different masking ratios from Table \ref{['tab_mask_ratio']}. It can be seen that MA3E successfully models the basic structure of the scenes in the original images and restores the preset angle variations. Even with the rotated crop retaining only 7 visible patches (80% masking), the model still exhibits excellent angle restoration ability. This demonstrates that MA3E has learned rotation-invariant representations and can infer complex reconstructions.
...and 5 more figures

Masked Angle-Aware Autoencoder for Remote Sensing Images

TL;DR

Abstract

Masked Angle-Aware Autoencoder for Remote Sensing Images

Authors

TL;DR

Abstract

Table of Contents

Figures (10)