Table of Contents
Fetching ...

Towards Photonic Band Diagram Generation with Transformer-Latent Diffusion Models

Valentin Delchevalerie, Nicolas Roy, Arnaud Bougaham, Alexandre Mayer, Benoît Frénay, Michaël Lobet

Abstract

Photonic crystals enable fine control over light propagation at the nanoscale, and thus play a central role in the development of photonic and quantum technologies. Photonic band diagrams (BDs) are a key tool to investigate light propagation into such inhomogeneous structured materials. However, computing BDs requires solving Maxwell's equations across many configurations, making it numerically expensive, especially when embedded in optimization loops for inverse design techniques, for example. To address this challenge, we introduce the first approach for BD generation based on diffusion models, with the capacity to later generalize and scale to arbitrary three dimensional structures. Our method couples a transformer encoder, which extracts contextual embeddings from the input structure, with a latent diffusion model to generate the corresponding BD. In addition, we provide insights into why transformers and diffusion models are well suited to capture the complex interference and scattering phenomena inherent to photonics, paving the way for new surrogate modeling strategies in this domain.

Towards Photonic Band Diagram Generation with Transformer-Latent Diffusion Models

Abstract

Photonic crystals enable fine control over light propagation at the nanoscale, and thus play a central role in the development of photonic and quantum technologies. Photonic band diagrams (BDs) are a key tool to investigate light propagation into such inhomogeneous structured materials. However, computing BDs requires solving Maxwell's equations across many configurations, making it numerically expensive, especially when embedded in optimization loops for inverse design techniques, for example. To address this challenge, we introduce the first approach for BD generation based on diffusion models, with the capacity to later generalize and scale to arbitrary three dimensional structures. Our method couples a transformer encoder, which extracts contextual embeddings from the input structure, with a latent diffusion model to generate the corresponding BD. In addition, we provide insights into why transformers and diffusion models are well suited to capture the complex interference and scattering phenomena inherent to photonics, paving the way for new surrogate modeling strategies in this domain.

Paper Structure

This paper contains 19 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Examples of BDs obtained from rigorous coupled-wave analysis (RCWA) simulations. The x-axis represents the in-plane component $\mathit{k}_x$ of the wavevector of light, and the y-axis is the corresponding frequency $\omega$. A dark pixel indicates the existence of a mode for this pair of $(\omega, \mathit{k}_x)$.
  • Figure 2: 3D structures are represented as ordered pair sequences $\{ \left(\epsilon_1, d_1\right), \dots, \left(\epsilon_k, d_k\right) \}$, where $\epsilon_i\left(x,y\right)$ are 2D dielectric maps for each layer and $d_i$ their corresponding thicknesses.
  • Figure 3: M2C encoder architecture --- A sequence of $k$ dielectric layers $\{ \epsilon_1\left(x,y\right), \dots, \epsilon_k\left(x,y\right) \}$ is encoded by a shared convolutional encoder, and concatenated with a context token. This yields a sequence of latent vectors $\{ \mathbf{z}_0, \mathbf{z}_1, \dots, \mathbf{z}_k \}$, each in ${\rm I\!R}^n$. Cumulative depth positional encodings (PE) are added to preserve both the ordering and the thicknesses of each layer. The sequence is then processed by $N$ stacked transformer encoder blocks. Finally, a MLP head projects the hidden dimension $n$ to the output dimension $m$, producing the contextual representations $\{ \tilde{\mathbf{z}}_0, \tilde{\mathbf{z}}_1, \dots, \tilde{\mathbf{z}}_k \}$, where $\tilde{\mathbf{z}}_0$ serves as the global contextual embedding.
  • Figure 4: BD encoder/decoder architecture --- A BD of size $H \times W$ is divided into $p=p_hp_w$ patches (optionally with random masking during training), each flattened and linearly projected. A context token is concatenated to the sequence, and positional encodings (PE) are added to preserve spatial relationships between patches. The resulting sequence $\{ \mathbf{z}_0, \mathbf{z}_1, \dots, \mathbf{z}_p \}$ with $\mathbf{z}_i \in {\rm I\!R}^n$, is processed by $N$ stacked transformer encoder blocks (similarly to the M2C encoder architecture). The context token is then mapped by an MLP head to a global representation $\tilde{\mathbf{z}}_0 \in {\rm I\!R}^m$, while the patch embeddings are projected to local representations $\{ \tilde{\mathbf{z}}_1, \dots, \tilde{\mathbf{z}}_k \}$ with $\tilde{\mathbf{z}}_i \in {\rm I\!R}^l$. These local embeddings can then be reshaped into a single latent representation with shape $(l, p_h, p_w)$, to later be forwarded to a convolutional decoder trained to reconstruct the BD in pixel space.
  • Figure 5: The local latent representations are first extracted from the target BDs thanks to the BD encoder, and reshaped to a latent representation $\tilde{\mathbf{z}}$ of shape $\left(l, p_h, p_w\right)$. A forward diffusion process iteratively perturbs these latents with Gaussian noise, progressively reducing the signal-to-noise ratio ($snr$). A denoising U-Net, conditioned via cross-attention on the material representation from the M2C encoder, is trained to iteratively predict the noise at each time step $t$. After reversing the diffusion process, the recovered latent $\tilde{\zeta}_0$ is forwarded to the BD decoder to reconstruct the BD. The steps highlighted with the black dotted box are only performed for training. During inference, the process directly starts from pure random noise. Figure inspired from rombach_high-resolution_2022.
  • ...and 2 more figures