Table of Contents
Fetching ...

Transformers for Stratified Spectropolarimetric Inversion: Proof of Concept

Ryan James Campbell, Mihalis Mathioudakis, Carlos Quintero Noda

TL;DR

Transformers for Stratified Spectropolarimetric Inversion demonstrates a transformer model that maps full-Stokes spectra across 15 lines to atmospheric parameters as a function of optical depth $\log(\tau)$. The method leverages learned spectral positional embeddings and cross-attention to capture non-local spectral dependencies, trained on 3D MHD MANCHA synthetic data and benchmarked against an MLP baseline. Results show higher correlations and more regularized stratifications for magnetic parameters, with robustness to Gaussian noise typical of solar observations, and faster inference enabling large-scale analysis. Cross-attention analyses link the model's focus to physically diagnostic spectral regions, suggesting interpretable behavior and potential utility as an initial guess for inversions, with limitations including synthetic-only training, chromospheric realism, and absence of instrumental effects.

Abstract

Solar spectropolarimetric inversion -- inferring atmospheric conditions from the Stokes vector -- is a key diagnostic tool for understanding solar magnetism, but traditional inversion methods are computationally expensive and sensitive to local minima. Advances in artificial intelligence (AI) offer faster solutions, but are often restricted to shallow models or a few spectral lines. We present a proof-of-concept study using a transformer machine learning (ML) model for multi-line, full-Stokes inversion, to infer stratified parameters from synthetic spectra produced from 3D magnetohydrodynamic simulations. We synthesise a large set of Stokes vectors using forward modelling across 15 spectral lines spanning the deep photosphere towards the chromosphere. The model maps full-Stokes input to temperature, magnetic field strength, inclination, azimuth (encoded as $\sin2φ$, $\cos2φ$), and line-of-sight velocity as a function of optical depth. The transformer incorporates an attention mechanism that allows the model to focus on the most informative regions of the spectrum for each inferred parameter, and uses positional embedding to encode wavelength and depth order. We benchmark it against a multilayer perceptron (MLP), test robustness to noise, and assess generalisation. The transformer outperforms the MLP, especially in the higher layers and for magnetic parameters, yielding higher correlations and more regularised stratifications. The model retains strong performance across a range of noise levels typical for real observations, with magnetic parameter inference degrading predictably while temperature and velocity remain stable. We explore attention maps, linking the transformer's learned behaviour to line-formation physics.

Transformers for Stratified Spectropolarimetric Inversion: Proof of Concept

TL;DR

Transformers for Stratified Spectropolarimetric Inversion demonstrates a transformer model that maps full-Stokes spectra across 15 lines to atmospheric parameters as a function of optical depth . The method leverages learned spectral positional embeddings and cross-attention to capture non-local spectral dependencies, trained on 3D MHD MANCHA synthetic data and benchmarked against an MLP baseline. Results show higher correlations and more regularized stratifications for magnetic parameters, with robustness to Gaussian noise typical of solar observations, and faster inference enabling large-scale analysis. Cross-attention analyses link the model's focus to physically diagnostic spectral regions, suggesting interpretable behavior and potential utility as an initial guess for inversions, with limitations including synthetic-only training, chromospheric realism, and absence of instrumental effects.

Abstract

Solar spectropolarimetric inversion -- inferring atmospheric conditions from the Stokes vector -- is a key diagnostic tool for understanding solar magnetism, but traditional inversion methods are computationally expensive and sensitive to local minima. Advances in artificial intelligence (AI) offer faster solutions, but are often restricted to shallow models or a few spectral lines. We present a proof-of-concept study using a transformer machine learning (ML) model for multi-line, full-Stokes inversion, to infer stratified parameters from synthetic spectra produced from 3D magnetohydrodynamic simulations. We synthesise a large set of Stokes vectors using forward modelling across 15 spectral lines spanning the deep photosphere towards the chromosphere. The model maps full-Stokes input to temperature, magnetic field strength, inclination, azimuth (encoded as , ), and line-of-sight velocity as a function of optical depth. The transformer incorporates an attention mechanism that allows the model to focus on the most informative regions of the spectrum for each inferred parameter, and uses positional embedding to encode wavelength and depth order. We benchmark it against a multilayer perceptron (MLP), test robustness to noise, and assess generalisation. The transformer outperforms the MLP, especially in the higher layers and for magnetic parameters, yielding higher correlations and more regularised stratifications. The model retains strong performance across a range of noise levels typical for real observations, with magnetic parameter inference degrading predictably while temperature and velocity remain stable. We explore attention maps, linking the transformer's learned behaviour to line-formation physics.

Paper Structure

This paper contains 19 sections, 4 equations, 18 figures, 1 table.

Figures (18)

  • Figure 1: Mean Stokes $I$ profile across the full field of view, averaged over all spatial pixels in the MANCHA simulation snapshot used for training. The vertical dashed lines and labels indicate the most important diagnostic spectral lines. These lines span a broad range of atmospheric heights, from the deep photosphere to the mid-chromosphere, providing constraints for inversion at multiple depths.
  • Figure 2: Flow diagram of the transformer-based inversion model. The encoder processes sequences of Stokes vectors across wavelength positions, where each token corresponds to a single wavelength and contains the full 4-dimensional Stokes vector. The decoder uses learned queries for each depth point to extract atmospheric parameters via cross-attention.
  • Figure 3: Schematic diagram of a single transformer encoder and decoder layer. Each encoder layer consists of multi-head self-attention, a position-wise feed-forward network (MLP), and residual connections. Each decoder layer additionally includes cross-attention to the encoder output. $d$ is the model (embedding) dimension, $h$ is the number of heads, $d_\mathrm{ff}$ is the MLP hidden dimension. Add and LayerNorm adds the input of each transformer block (e.g. attention or feed-forward) to its output, and then applies normalisation to stabilise training; this is a standard architectural feature of modern transformer networks.
  • Figure 4: Flow diagram of a 2-hidden-layer MLP inversion model. The entire input spectrum is first flattened and then passed through three fully connected layers, producing the full set of atmospheric parameters at all depths as output.
  • Figure 5: Training (solid lines) and validation (dashed lines) loss curves for the transformer (blue) and MLP (red) models, as a function of epoch. The epoch at which the transformer achieves its lowest validation loss is epoch 34 with a validation loss of 0.112. The lowest validation loss for the MLP was at epoch 34 with 0.1417.
  • ...and 13 more figures