EllipBench: A Large-scale Benchmark for Machine-learning based Ellipsometry Modeling

Yiming Ma; Xinjie Li; Xin Sun; Zhiyong Wang; Lionel Z. Wang

EllipBench: A Large-scale Benchmark for Machine-learning based Ellipsometry Modeling

Yiming Ma, Xinjie Li, Xin Sun, Zhiyong Wang, Lionel Z. Wang

TL;DR

This work tackles the inverse ellipsometry problem, where extracting film optical constants $(n_2,k_2)$ and thickness $d$ from measured parameters $(\Delta,\Psi)$ is ill-posed and labor-intensive. It introduces EllipBench, a large-scale benchmark with 8.3 million data points across 98 films and 4 substrates, and proposes a deep neural network with residual connections and self-attention, augmented by a reconstruction loss that handles one-to-many thickness solutions in an end-to-end framework. The approach significantly outperforms traditional ML baselines on the dataset and demonstrates strong generalization, with detailed ablations confirming the value of depth, attention, and the reconstruction loss. Overall, EllipBench provides a robust resource and method to accelerate non-destructive thin-film characterization and ellipsometry modeling, with dataset and code to be released upon acceptance.

Abstract

Ellipsometry is used to indirectly measure the optical properties and thickness of thin films. However, solving the inverse problem of ellipsometry is time-consuming since it involves human expertise to apply the data fitting techniques. Many studies use traditional machine learning-based methods to model the complex mathematical fitting process. In our work, we approach this problem from a deep learning perspective. First, we introduce a large-scale benchmark dataset to facilitate deep learning methods. The proposed dataset encompasses 98 types of thin film materials and 4 types of substrate materials, including metals, alloys, compounds, and polymers, among others. Additionally, we propose a deep learning framework that leverages residual connections and self-attention mechanisms to learn the massive data points. We also introduce a reconstruction loss to address the common challenge of multiple solutions in thin film thickness prediction. Compared to traditional machine learning methods, our framework achieves state-of-the-art (SOTA) performance on our proposed dataset. The dataset and code will be available upon acceptance.

EllipBench: A Large-scale Benchmark for Machine-learning based Ellipsometry Modeling

TL;DR

This work tackles the inverse ellipsometry problem, where extracting film optical constants

and thickness

from measured parameters

is ill-posed and labor-intensive. It introduces EllipBench, a large-scale benchmark with 8.3 million data points across 98 films and 4 substrates, and proposes a deep neural network with residual connections and self-attention, augmented by a reconstruction loss that handles one-to-many thickness solutions in an end-to-end framework. The approach significantly outperforms traditional ML baselines on the dataset and demonstrates strong generalization, with detailed ablations confirming the value of depth, attention, and the reconstruction loss. Overall, EllipBench provides a robust resource and method to accelerate non-destructive thin-film characterization and ellipsometry modeling, with dataset and code to be released upon acceptance.

Abstract

Paper Structure (19 sections, 13 equations, 4 figures, 3 tables)

This paper contains 19 sections, 13 equations, 4 figures, 3 tables.

Introduction
Related Work
Mathematical-Inversion Method
Machine-Learning Method
EllipBench
Data Source
Data Statistics
Comparison with Existing Datasets
Method
Problem Statement
Network Architecture
Loss
Experiments
Implementation Details
Evaluation Metrics
...and 4 more sections

Figures (4)

Figure 1: Schematic of light refraction in thin films and substrates. Light is incident onto a thin film with unknown optical constants and thickness ($n_2$, $k_2$, $d$) on a substrate with known optical constants ($n_3$, $k_3$). Ellipsometry measures the parameters $\Psi$ and $\Delta$, with the forward mapping from ($n_2$, $k_2$, $d$) to ($\Psi$, $\Delta$). However, inverse mapping does not have an exact analytical formula and relies on data-fitting techniques.
Figure 2: Data Distribution of EllipBench
Figure 3: Framework overview. The proposed framework is built upon a deep neural network, with the encoder $\varTheta_e$ consisting of 150 layers. The mapper $\varTheta_m$ processes the input data, mapping it to a high-dimensional space to generate a feature map. The self-attention block then extracts important features from this feature map, $F_e^a$. Following the self-attention mechanism, the output values are obtained from the feature map by three separate projectors, each comprising a single fully connected layer. Both the reconstruction loss and the fitting loss collaboratively guide the mapper, the encoder, the self-attention block, and the projectors in updating their parameters.
Figure 4: Generalization on Unknown Materials

EllipBench: A Large-scale Benchmark for Machine-learning based Ellipsometry Modeling

TL;DR

Abstract

EllipBench: A Large-scale Benchmark for Machine-learning based Ellipsometry Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (4)