SpecDM: Hyperspectral Dataset Synthesis with Pixel-level Semantic Annotations
Wendi Liu, Pei Yang, Wenhui Hong, Xiaoguang Mei, Jiayi Ma
TL;DR
This work tackles the scarcity of labeled hyperspectral data for dense prediction tasks by introducing SpecDM, a diffusion-based pipeline that learns a joint image-label latent distribution. It employs a two-stream VAE to encode HSIs and pixel-level annotations into latent codes $(z_x, z_y)$ and trains a latent diffusion model to synthesize realistic image-label pairs, enabling both semantic segmentation and change detection datasets. The approach demonstrates improved downstream performance and spectral fidelity on SegMunich and OSCD compared with baselines, validating the usefulness of synthetic labeled HSIs for data-hungry tasks. A key limitation is the challenge of automatically verifying pixel-level alignment between generated images and annotations without ground truth, which the authors acknowledge for future work.
Abstract
In hyperspectral remote sensing field, some downstream dense prediction tasks, such as semantic segmentation (SS) and change detection (CD), rely on supervised learning to improve model performance and require a large amount of manually annotated data for training. However, due to the needs of specific equipment and special application scenarios, the acquisition and annotation of hyperspectral images (HSIs) are often costly and time-consuming. To this end, our work explores the potential of generative diffusion model in synthesizing HSIs with pixel-level annotations. The main idea is to utilize a two-stream VAE to learn the latent representations of images and corresponding masks respectively, learn their joint distribution during the diffusion model training, and finally obtain the image and mask through their respective decoders. To the best of our knowledge, it is the first work to generate high-dimensional HSIs with annotations. Our proposed approach can be applied in various kinds of dataset generation. We select two of the most widely used dense prediction tasks: semantic segmentation and change detection, and generate datasets suitable for these tasks. Experiments demonstrate that our synthetic datasets have a positive impact on the improvement of these downstream tasks.
