Table of Contents
Fetching ...

FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging

Mohammed Talha Alam, Raza Imam, Mohsen Guizani, Fakhri Karray

TL;DR

This work tackles the problem of classifying noisy, low-resolution astronomical images and distribution shifts by introducing FLARE, a two-stage augmentation framework that first upscales LR images to HR using SwinIR and applies standard augmentations, then generates diffusion-based synthetic samples via UniDiffuser with class-conditioned prompts. These real and synthetic samples are merged through a weighted percentile strategy to form SpaceNet, an optimally distributed HR dataset for robust classification. Experiments show substantial gains, with up to ~20% improvement on fine-grained tasks and ~15% average gains across classifiers, while diffusion-based samples reduce intra-class variability and sharpen decision boundaries. The approach enhances generalization to both in-domain and out-of-domain tasks and offers a practical, scalable path for improving data handling and analysis in astronomical research.

Abstract

The intersection of Astronomy and AI encounters significant challenges related to issues such as noisy backgrounds, lower resolution (LR), and the intricate process of filtering and archiving images from advanced telescopes like the James Webb. Given the dispersion of raw images in feature space, we have proposed a \textit{two-stage augmentation framework} entitled as \textbf{FLARE} based on \underline{f}eature \underline{l}earning and \underline{a}ugmented \underline{r}esolution \underline{e}nhancement. We first apply lower (LR) to higher resolution (HR) conversion followed by standard augmentations. Secondly, we integrate a diffusion approach to synthetically generate samples using class-concatenated prompts. By merging these two stages using weighted percentiles, we realign the feature space distribution, enabling a classification model to establish a distinct decision boundary and achieve superior generalization on various in-domain and out-of-domain tasks. We conducted experiments on several downstream cosmos datasets and on our optimally distributed \textbf{SpaceNet} dataset across 8-class fine-grained and 4-class macro classification tasks. FLARE attains the highest performance gain of 20.78\% for fine-grained tasks compared to similar baselines, while across different classification models, FLARE shows a consistent increment of an average of +15\%. This outcome underscores the effectiveness of the FLARE method in enhancing the precision of image classification, ultimately bolstering the reliability of astronomical research outcomes. % Our code and SpaceNet dataset will be released to the public soon. Our code and SpaceNet dataset is available at \href{https://github.com/Razaimam45/PlanetX_Dxb}{\textit{https://github.com/Razaimam45/PlanetX\_Dxb}}.

FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging

TL;DR

This work tackles the problem of classifying noisy, low-resolution astronomical images and distribution shifts by introducing FLARE, a two-stage augmentation framework that first upscales LR images to HR using SwinIR and applies standard augmentations, then generates diffusion-based synthetic samples via UniDiffuser with class-conditioned prompts. These real and synthetic samples are merged through a weighted percentile strategy to form SpaceNet, an optimally distributed HR dataset for robust classification. Experiments show substantial gains, with up to ~20% improvement on fine-grained tasks and ~15% average gains across classifiers, while diffusion-based samples reduce intra-class variability and sharpen decision boundaries. The approach enhances generalization to both in-domain and out-of-domain tasks and offers a practical, scalable path for improving data handling and analysis in astronomical research.

Abstract

The intersection of Astronomy and AI encounters significant challenges related to issues such as noisy backgrounds, lower resolution (LR), and the intricate process of filtering and archiving images from advanced telescopes like the James Webb. Given the dispersion of raw images in feature space, we have proposed a \textit{two-stage augmentation framework} entitled as \textbf{FLARE} based on \underline{f}eature \underline{l}earning and \underline{a}ugmented \underline{r}esolution \underline{e}nhancement. We first apply lower (LR) to higher resolution (HR) conversion followed by standard augmentations. Secondly, we integrate a diffusion approach to synthetically generate samples using class-concatenated prompts. By merging these two stages using weighted percentiles, we realign the feature space distribution, enabling a classification model to establish a distinct decision boundary and achieve superior generalization on various in-domain and out-of-domain tasks. We conducted experiments on several downstream cosmos datasets and on our optimally distributed \textbf{SpaceNet} dataset across 8-class fine-grained and 4-class macro classification tasks. FLARE attains the highest performance gain of 20.78\% for fine-grained tasks compared to similar baselines, while across different classification models, FLARE shows a consistent increment of an average of +15\%. This outcome underscores the effectiveness of the FLARE method in enhancing the precision of image classification, ultimately bolstering the reliability of astronomical research outcomes. % Our code and SpaceNet dataset will be released to the public soon. Our code and SpaceNet dataset is available at \href{https://github.com/Razaimam45/PlanetX_Dxb}{\textit{https://github.com/Razaimam45/PlanetX\_Dxb}}.
Paper Structure (14 sections, 8 equations, 10 figures, 6 tables)

This paper contains 14 sections, 8 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: The combination of traditional augmentation and synthetic samples created through diffusion benefits in terms of harmonizing feature representations, achieving higher classification performance.
  • Figure 2: Our proposed methodology, FLARE: We upscale a raw dataset to high resolution using SwinIR. Next, we apply standard augmentation techniques. Then, we create synthetic samples by combining class-concatenated prompts with UniDiffuser. We then select relevant augmentations and combine them based on weighted percentiles. The resulting optimally distributed dataset is then fed into a classifier for enhanced classification.
  • Figure 3: The original raw dataset (Raw_Aug), when transformed into our combined dataset using the FLARE approach, results in 7.8$\times$ increase in the number of samples. Ours represent the proposed SpaceNet dataset.
  • Figure 4: Integrated Gradient for LR inputs, HR inputs, and their difference across different methods, illustrating visual relationship between the model's predictions and the extracted features. LR-to-HR module of FLARE helps to embed discriminative features in input space.
  • Figure 5: Visual comparison of upscaled noise restoration examples across 8 classes
  • ...and 5 more figures