Table of Contents
Fetching ...

Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory Records

Shiyu Shen, Zhe Gao, Taifeng Chai, Yang Huang, Bin Pan

TL;DR

SolarCHIP addresses the need for domain-adapted visual backbones for multi-instrument SDO data by incorporating cross-modal, time- and location-aware contrastive objectives. The method trains both CNN- and ViT-based autoencoders with three complementary losses—global class-level, patch-level, and intra-sample contrastive learning—while reconstructing inputs. It achieves state-of-the-art performance on cross-modal translation between HMI and AIA passbands and on full-disk flare classification, especially in low-resource regimes, and provides pretrained weights and code to the community. The work offers a practical, reusable foundation for diverse solar-imaging tasks, reducing data and compute demands and enabling label-efficient analysis.

Abstract

Deep learning has revolutionized solar image analysis, yet most approaches train task-specific encoders from scratch or rely on natural-image pretraining that ignores the unique characteristics of Solar Dynamics Observatory (SDO) data. We introduce SolarCHIP, a family of contrastively pretrained visual backbones tailored to multi-instrument SDO observations. SolarCHIP addresses three key challenges in solar imaging: multimodal sensing across AIA and HMI instruments, weak inter-class separability due to slow temporal evolution, and strong intra-class variability with sparse activity signals. Our pretraining framework employs a multi-granularity contrastive objective that jointly aligns (1) global class tokens across co-temporal AIA-HMI pairs to enhance temporal discrimination, (2) local patch tokens at fixed spatial indices to enforce position-consistent, modality-invariant features, and (3) intra-sample patches across different spatial locations to preserve fine-grained spatial structure. We train both CNN- and Vision Transformer-based autoencoders and demonstrate their effectiveness on two downstream tasks: cross-modal translation between HMI and AIA passbands via ControlNet, and full-disk flare classification. Experimental results show that SolarCHIP achieves state-of-the-art performance across both tasks, with particularly strong gains in low-resource settings where labeled data is limited. Ablation studies confirm that each contrastive component contributes essential discriminative capacity at different granularities. By publicly releasing pretrained weights and training code, we provide the heliophysics community with a practical, plug-and-play feature extractor that reduces computational requirements, improves label efficiency, and establishes a reusable foundation for diverse solar imaging applications.

Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory Records

TL;DR

SolarCHIP addresses the need for domain-adapted visual backbones for multi-instrument SDO data by incorporating cross-modal, time- and location-aware contrastive objectives. The method trains both CNN- and ViT-based autoencoders with three complementary losses—global class-level, patch-level, and intra-sample contrastive learning—while reconstructing inputs. It achieves state-of-the-art performance on cross-modal translation between HMI and AIA passbands and on full-disk flare classification, especially in low-resource regimes, and provides pretrained weights and code to the community. The work offers a practical, reusable foundation for diverse solar-imaging tasks, reducing data and compute demands and enabling label-efficient analysis.

Abstract

Deep learning has revolutionized solar image analysis, yet most approaches train task-specific encoders from scratch or rely on natural-image pretraining that ignores the unique characteristics of Solar Dynamics Observatory (SDO) data. We introduce SolarCHIP, a family of contrastively pretrained visual backbones tailored to multi-instrument SDO observations. SolarCHIP addresses three key challenges in solar imaging: multimodal sensing across AIA and HMI instruments, weak inter-class separability due to slow temporal evolution, and strong intra-class variability with sparse activity signals. Our pretraining framework employs a multi-granularity contrastive objective that jointly aligns (1) global class tokens across co-temporal AIA-HMI pairs to enhance temporal discrimination, (2) local patch tokens at fixed spatial indices to enforce position-consistent, modality-invariant features, and (3) intra-sample patches across different spatial locations to preserve fine-grained spatial structure. We train both CNN- and Vision Transformer-based autoencoders and demonstrate their effectiveness on two downstream tasks: cross-modal translation between HMI and AIA passbands via ControlNet, and full-disk flare classification. Experimental results show that SolarCHIP achieves state-of-the-art performance across both tasks, with particularly strong gains in low-resource settings where labeled data is limited. Ablation studies confirm that each contrastive component contributes essential discriminative capacity at different granularities. By publicly releasing pretrained weights and training code, we provide the heliophysics community with a practical, plug-and-play feature extractor that reduces computational requirements, improves label efficiency, and establishes a reusable foundation for diverse solar imaging applications.

Paper Structure

This paper contains 22 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of SolarCHIP training pipeline. We first preprocess SDO images with calibration and augmentation. The preprocessed images are then fed into modality specific encoders and decoders to extract tokens and reconstruct inputs. We apply three contrastive learning objectives at different granularities: (a) class-level (b) patch-level (c) intra-sample contrastive learning. The similarity matrices illustrate the alignment enforced by each contrastive objective, where gray diagonal entries indicate stronger similarity between positive pairs.
  • Figure 2: Overview of downstream applications. We train a ControlNet with pretrained encoder for cross-modal translation between HMI and AIA (top). We append a classification head to the HMI encoder for solar flare recognition (bottom).
  • Figure 3: Visualization of translated images from HMI to AIA. The top row shows the synthesized AIA images conditioned on the same HMI inputs, while the bottom row displays the corresponding ground truth AIA observations.
  • Figure 4: Visualization of translated images from AIA to HMI. The leftmost column shows the ground truth HMI observation, while the right two rows display the synthesized HMI images conditioned on the corresponding AIA inputs.
  • Figure 5: Few-shot Adaption Results. We progressively reduce the number of labeled training samples to simulate low-resource scenarios, specifically using 100%, 50%, 20%, 10%, and 5% of the original training set. Both SolarCHIP-pretrained models and randomly initialized baselines are fine-tuned under these constraints. The performance is assessed using (a) the accuracy of all classes as well as the (b) $\geqslant$M and (c) $\geqslant$C binary classifications.