Efficient reduction of stellar contamination and noise in planetary transmission spectra using neural networks

David S. Duque-Castaño; Lauren Flor-Torres; Jorge I. Zuluaga

Efficient reduction of stellar contamination and noise in planetary transmission spectra using neural networks

David S. Duque-Castaño, Lauren Flor-Torres, Jorge I. Zuluaga

TL;DR

This work tackles the major challenge of stellar contamination and instrument noise in exoplanet transmission spectra by introducing general denoising autoencoders (G-DAEs) trained on large synthetic datasets of terrestrial and sub-Neptune analogues. The G-DAEs reconstruct clean spectra while preserving molecular features and enable retrievals with reduced bias and substantially lower computational cost, achieving up to a 3–6× speedup compared to conventional approaches. Bayesian uncertainty estimation via MC dropout provides robust error bars on reconstructions, and the method generalizes to both TRAPPIST-1e-like planets and K2-18 b-like sub-Neptunes, including realistic JWST/NIRSpec noise. The approach offers a scalable, unsupervised pathway to integrate denoising into future atmospheric characterization pipelines for diverse exoplanet classes, with practical demonstrations against POSEIDON-based retrievals and publicly available tools for adoption.

Abstract

Context: JWST has enabled transmission spectroscopy at unprecedented precision, but stellar heterogeneities (spots and faculae) remain a dominant contamination source that can bias atmospheric retrievals if uncorrected. Aims: We present a fast, unsupervised methodology to reduce stellar contamination and instrument-specific noise in exoplanet transmission spectra using denoising autoencoders, improving the reliability of retrieved atmospheric parameters. Methods: We design and train denoising autoencoder architectures on large synthetic datasets of terrestrial (TRAPPIST-1e analogues) and sub-Neptune (K2-18b analogues) planets. Reconstruction quality is evaluated with the $χ^2$ statistic over a wide range of signal-to-noise ratios, and atmospheric retrieval experiments on contaminated spectra are used to compare against standard correction approaches in accuracy and computational cost. Results: The autoencoders reconstruct uncontaminated spectra while preserving key molecular features, even at low S/N. In retrieval tests, pre-processing with denoising autoencoders reduces bias in inferred abundances relative to uncorrected baselines and matches the accuracy of simultaneous stellar-contamination fitting while reducing computational time by a factor of three to six. Conclusions: Denoising autoencoders provide an efficient alternative to conventional correction strategies and are promising components of future atmospheric characterization pipelines for both rocky and gaseous exoplanets.

Efficient reduction of stellar contamination and noise in planetary transmission spectra using neural networks

TL;DR

Abstract

statistic over a wide range of signal-to-noise ratios, and atmospheric retrieval experiments on contaminated spectra are used to compare against standard correction approaches in accuracy and computational cost. Results: The autoencoders reconstruct uncontaminated spectra while preserving key molecular features, even at low S/N. In retrieval tests, pre-processing with denoising autoencoders reduces bias in inferred abundances relative to uncorrected baselines and matches the accuracy of simultaneous stellar-contamination fitting while reducing computational time by a factor of three to six. Conclusions: Denoising autoencoders provide an efficient alternative to conventional correction strategies and are promising components of future atmospheric characterization pipelines for both rocky and gaseous exoplanets.

Paper Structure (23 sections, 8 equations, 11 figures, 3 tables)

This paper contains 23 sections, 8 equations, 11 figures, 3 tables.

Introduction
Stellar contamination
Autoencoders for denoising transmission spectra
Noise sources
Autoencoders in astronomy
A general stellar contamination DAE (G-DAE)
Model architecture
Model training
Model validation
Quantifying uncertainties
Model evaluation
Denoising with realistic instrumental noise
Quantitative assessment of model performance
Comparative evaluation of retrieval with POSEIDON
Dependence on atmospheric chemistry
...and 8 more sections

Figures (11)

Figure 1: Schematic representation of the effect of stellar contamination (TLS) with actual examples of the effect of TLS in simulated signals. In the left column, we show the stellar spectrum, both when the photosphere is clean (no heterogeneities, stellar spots, or faculae) and when the star exhibits heterogeneities (second and third rows). The rows show the difference (residual) between the clean and contaminated stellar spectra. The residuals have been amplified relative to the spectra to highlight regions where the effects are more pronounced. In the right column, we show the corresponding transmission spectra: in the upper half, the clean photosphere case, and in the bottom rows, the contaminated case. In all cases, we have illustrated the simpler case when the chord does not include any heterogeneity, $c_\mathrm{spot}=c_\mathrm{fac}=0$ (see text).
Figure 2: Architecture of a standard Denoising Autoencoder (DAE). The input signal $X$ is the observed transit spectrum (continuous line), which has been transformed from the original clean spectrum $X_0$ (dashed line) by the effect of stellar contamination. The spectrum is fed into the neural network through an encoder, composed of multiple dense hidden layers. These layers progressively compress the spectral information, ultimately producing a compact, abstract representation $Z$ in the so-called "Latent Space". Subsequently, the network’s decoder uses this latent representation to reconstruct a clean spectrum $X'$. During training, the autoencoder learns to effectively isolate and remove stellar contamination signals during encoding.
Figure 3: Examples of the autoencoder implementation designed to mitigate stellar contamination in three representative synthetic transmission spectra: an airless planet (first row), a planet with a CO$_2$-rich atmosphere (second row), and a planet with potential biosignatures (third row). The input spectra have an S/N of 3 and include stellar contamination levels of $f_{\mathrm{spot}} = 0.08$ and $f_{\mathrm{fac}} = 0.54$. The panels in the first column display the synthetic spectra free from noise and contamination; the second column shows the input spectra affected by noise and contamination; and the third column presents the reconstructed spectra returned by the autoencoder.
Figure 4: Clean transmission spectrum of a TRAPPIST-1 e analogue with an arbitrary (non-equilibrium composition). The contribution of each gas has been plotted as a continuous, thick line, while the signal of the fill gases (CO$_2$ and N$_2$) is shown as a thinner line in the background. The combined spectrum is limited by the dashed line, and the shaded region is also shown in the background.
Figure 5: Comparison between noisy and contaminated input spectra (square markers with error bars) and the reconstructions performed by our autoencoder networks (round markers) for a TRAPPIST-1e analogue with different atmospheric compositions and stellar contamination, when observed with JWST NIRSpec using ten transits. Each plot shows the original clean spectra (upper panel), the noisy and reconstructed spectra (middle panel), and the residuals for both the contaminated and reconstructed spectra (bottom panel). The y-axis scales for the transit-depth and residual plots in the noisy (right) and reconstructed (left) signals differ to highlight the disparities between the two spectra and the original spectrum. We consider three planetary scenarios: an airless planet (first row), a planet with a biogenic atmosphere but no stellar contamination (second row), and the same planet with the maximum stellar contamination.
...and 6 more figures

Efficient reduction of stellar contamination and noise in planetary transmission spectra using neural networks

TL;DR

Abstract

Efficient reduction of stellar contamination and noise in planetary transmission spectra using neural networks

Authors

TL;DR

Abstract

Table of Contents

Figures (11)