Label-Efficient Hyperspectral Image Classification via Spectral FiLM Modulation of Low-Level Pretrained Diffusion Features
Yuzhen Hu, Biplab Banerjee, Saurabh Prasad
TL;DR
This work tackles label-efficient hyperspectral image classification by repurposing a frozen diffusion model trained on natural images to extract spatial features, which are then fused with spectral information through FiLM-based modulation. The proposed GeoDiffNet uses low-level diffusion features to generalize across geospatial domains without finetuning, while GeoDiffNet-F adds spectral-conditioned FiLM to enable dynamic multimodal fusion under sparse supervision. Key findings show that early diffusion timesteps and higher decoder layers yield more transferable spatial features, and that spectral FiLM fusion outperforms baselines on Augsburg and Berlin datasets. The approach demonstrates strong cross-domain transferability and practical potential for remote sensing tasks with limited labeled data, with code made publicly available.
Abstract
Hyperspectral imaging (HSI) enables detailed land cover classification, yet low spatial resolution and sparse annotations pose significant challenges. We present a label-efficient framework that leverages spatial features from a frozen diffusion model pretrained on natural images. Our approach extracts low-level representations from high-resolution decoder layers at early denoising timesteps, which transfer effectively to the low-texture structure of HSI. To integrate spectral and spatial information, we introduce a lightweight FiLM-based fusion module that adaptively modulates frozen spatial features using spectral cues, enabling robust multimodal learning under sparse supervision. Experiments on two recent hyperspectral datasets demonstrate that our method outperforms state-of-the-art approaches using only the provided sparse training labels. Ablation studies further highlight the benefits of diffusion-derived features and spectral-aware fusion. Overall, our results indicate that pretrained diffusion models can support domain-agnostic, label-efficient representation learning for remote sensing and broader scientific imaging tasks.
