Table of Contents
Fetching ...

Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image Generation

Xingzhe Su, Wenwen Qiang, Jie Hu, Fengge Wu, Changwen Zheng, Fuchun Sun

TL;DR

This work identifies a unique RS-specific vulnerability in GAN-based image generation, showing that RS models lose feature information more rapidly as training data shrink, leading to degraded quality. It formalizes this insight with a structural causal model and counterfactual interpretation of generated images, proving that image quality correlates with feature-information content. To mitigate the issue, the authors introduce Uniformity Regularization (UR) and Entropy Regularization (ER), which raise distribution- and sample-level feature entropy and are model-agnostic. Across numerous RS and natural datasets and GAN architectures, UR/ER yield consistent improvements in FID/KID and visual fidelity, demonstrating practical impact for RS image synthesis and broader generative tasks.

Abstract

Generative adversarial networks (GANs) have achieved remarkable progress in the natural image field. However, when applying GANs in the remote sensing (RS) image generation task, an extraordinary phenomenon is observed: the GAN model is more sensitive to the size of training data for RS image generation than for natural image generation. In other words, the generation quality of RS images will change significantly with the number of training categories or samples per category. In this paper, we first analyze this phenomenon from two kinds of toy experiments and conclude that the amount of feature information contained in the GAN model decreases with reduced training data. Then we establish a structural causal model (SCM) of the data generation process and interpret the generated data as the counterfactuals. Based on this SCM, we theoretically prove that the quality of generated images is positively correlated with the amount of feature information. This provides insights for enriching the feature information learned by the GAN model during training. Consequently, we propose two innovative adjustment schemes, namely Uniformity Regularization (UR) and Entropy Regularization (ER), to increase the information learned by the GAN model at the distributional and sample levels, respectively. We theoretically and empirically demonstrate the effectiveness and versatility of our methods. Extensive experiments on three RS datasets and two natural datasets show that our methods outperform the well-established models on RS image generation tasks. The source code is available at https://github.com/rootSue/Causal-RSGAN.

Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image Generation

TL;DR

This work identifies a unique RS-specific vulnerability in GAN-based image generation, showing that RS models lose feature information more rapidly as training data shrink, leading to degraded quality. It formalizes this insight with a structural causal model and counterfactual interpretation of generated images, proving that image quality correlates with feature-information content. To mitigate the issue, the authors introduce Uniformity Regularization (UR) and Entropy Regularization (ER), which raise distribution- and sample-level feature entropy and are model-agnostic. Across numerous RS and natural datasets and GAN architectures, UR/ER yield consistent improvements in FID/KID and visual fidelity, demonstrating practical impact for RS image synthesis and broader generative tasks.

Abstract

Generative adversarial networks (GANs) have achieved remarkable progress in the natural image field. However, when applying GANs in the remote sensing (RS) image generation task, an extraordinary phenomenon is observed: the GAN model is more sensitive to the size of training data for RS image generation than for natural image generation. In other words, the generation quality of RS images will change significantly with the number of training categories or samples per category. In this paper, we first analyze this phenomenon from two kinds of toy experiments and conclude that the amount of feature information contained in the GAN model decreases with reduced training data. Then we establish a structural causal model (SCM) of the data generation process and interpret the generated data as the counterfactuals. Based on this SCM, we theoretically prove that the quality of generated images is positively correlated with the amount of feature information. This provides insights for enriching the feature information learned by the GAN model during training. Consequently, we propose two innovative adjustment schemes, namely Uniformity Regularization (UR) and Entropy Regularization (ER), to increase the information learned by the GAN model at the distributional and sample levels, respectively. We theoretically and empirically demonstrate the effectiveness and versatility of our methods. Extensive experiments on three RS datasets and two natural datasets show that our methods outperform the well-established models on RS image generation tasks. The source code is available at https://github.com/rootSue/Causal-RSGAN.
Paper Structure (25 sections, 9 theorems, 32 equations, 19 figures, 9 tables)

This paper contains 25 sections, 9 theorems, 32 equations, 19 figures, 9 tables.

Key Result

theorem 1

Consider the data generating process described in Fig.fig2-2, and assume further that: (i) $\mathbf{f}:\mathcal{Z} \rightarrow \mathcal{X}$ is smooth and differentiable. (ii) $p_\mathbf{z}$ is a smooth, continuous density on $Z$ with $p_\mathbf{z}(\mathbf{z}) > 0$ almost everywhere; (iii) for any $l where $H(\cdot)$ denotes the differential entropy of the random variable $\mathbf{g}(\mathbf{x})$ t

Figures (19)

  • Figure 1: The FID scores for different experiments on RS datasets : NWPU-RESISC45, PN and RSD46 datasets, and natural dataset: ImageNet Carnivores, TinyImageNet and Places365 datasets. These datasets are trained on StyleGAN2+ADA by varying (a) the number of classes and (b) the number of images per class.
  • Figure 2: "real" denotes the features extracted from the real dataset. "full" indicates the features extracted from the samples by the GAN model, which is trained under the original dataset. "c" denotes the number of classes used to train the GAN model, and "s" denotes the number of samples. (a) The distributions of samples in the feature space. The samples are generated by GAN models trained under different data setups. We show the cluster centers of samples for simplicity. For fair comparison, we use the ten cluster centers under different class settings. (b) The average feature of samples generated by the same model. We plot the feature distribution on the unit hypersphere $S^1$ and the Gaussian kernel density estimation curves. The two numbers below each chart are the average pairwise $G_2$ potential (the lower, the better) of the distributions and the information entropy (the higher, the better) of the features.
  • Figure 3: Motivating experiments by different feature extraction networks (a)(b)(c), and on natural images (d).
  • Figure 4: Overview of the image generation process. We partition the latent variable $\mathbf{z}$ into content $\mathbf{c}$ and noise $\mathbf{\epsilon}$. We assume that only noise changes between the real image $\mathbf{x}$ and the generated image $\widetilde{\mathbf{x}}$.
  • Figure 5: The overall architecture of our method. $G$ and $D$ are the generator and the discriminator. $f$ denotes the intermediate features. $M$ is the batch size. The real dataset is omitted for clarity as our method is only imposed on generated data.
  • ...and 14 more figures

Theorems & Definitions (14)

  • definition thmcounterdefinition
  • theorem 1
  • theorem 2
  • theorem 3
  • theorem 4
  • proposition thmcounterproposition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • lemma thmcounterlemma
  • ...and 4 more