An autoencoder for compressing angle-resolved photoemission spectroscopy data

Steinn Ymir Agustsson; Mohammad Ahsanul Haque; Thi Tam Truong; Marco Bianchi; Nikita Klyuchnikov; Davide Mottin; Panagiotis Karras; Philip Hofmann

An autoencoder for compressing angle-resolved photoemission spectroscopy data

Steinn Ymir Agustsson, Mohammad Ahsanul Haque, Thi Tam Truong, Marco Bianchi, Nikita Klyuchnikov, Davide Mottin, Panagiotis Karras, Philip Hofmann

TL;DR

This work introduces ARPESNet, a versatile autoencoder network that efficiently summmarises and compresses ARPES datasets and tests k-means clustering quality between data compressed by ARPESNet, data compressed by discrete cosine transform, and raw data, at different noise levels.

Abstract

Angle-resolved photoemission spectroscopy (ARPES) is a powerful experimental technique to determine the electronic structure of solids. Advances in light sources for ARPES experiments are currently leading to a vast increase of data acquisition rates and data quantity. On the other hand, access time to the most advanced ARPES instruments remains strictly limited, calling for fast, effective, and on-the-fly data analysis tools to exploit this time. In response to this need, we introduce ARPESNet, a versatile autoencoder network that efficiently summmarises and compresses ARPES datasets. We train ARPESNet on a large and varied dataset of 2-dimensional ARPES data extracted by cutting standard 3-dimensional ARPES datasets along random directions in $\mathbf{k}$. To test the data representation capacity of ARPESNet, we compare $k$-means clustering quality between data compressed by ARPESNet, data compressed by discrete cosine transform, and raw data, at different noise levels. ARPESNet data excels in clustering quality despite its high compression ratio.

An autoencoder for compressing angle-resolved photoemission spectroscopy data

TL;DR

Abstract

. To test the data representation capacity of ARPESNet, we compare

-means clustering quality between data compressed by ARPESNet, data compressed by discrete cosine transform, and raw data, at different noise levels. ARPESNet data excels in clustering quality despite its high compression ratio.

Paper Structure (9 sections, 8 figures, 1 table)

This paper contains 9 sections, 8 figures, 1 table.

Introduction
Autoencoder Network
ARPESNet Architecture
Training Data
Training and Testing
Results
Conclusion
Acknowledgement
Code and data availability

Figures (8)

Figure 1: Structure of the ARPESNet autoencoder. The input and output are ARPES images of photoemission intensity as a function of crystal momentum $k$ and energy $E$. The labels on top of the compression and decompression blocks are structured as "$c$$\times$$a$$\times$$b$k$N_k$", where $c$, $a$ and $b$ indicate the number channels and lateral sizes respectively, and $N_k$ is the kernel size. In the encoder (decoder) block, blue (light blue) bars indicate (transposed) convolutional layers, while red bars indicate parametric rectified linear unit (PReLU) activation functions Kaiming:2018aa.
Figure 2: Extraction of training images from a 3D ARPES dataset. (a) A 3D dataset for NdTe$_3$Chikina:2023aa; solid and dashed red lines mark random cuts in $k_x, k_y$ defining 2D ARPES spectra. (b) Photoemission intensity images along these two paths.
Figure 3: Testing the performance ARPESNet after 4,000 epochs of training. Top row: Test images; middle row: reconstruction and bottom row: normalised residual. All images show the photoemission intensity as a function of energy (vertical) and crystal momentum (horizontal). The arrow in the original image of the first column marks a weak but sharp state that is not well-captured in the reconstruction.
Figure 4: Shortcomings of reconstructing sharp features in the test data of Fig. \ref{['fig:3']} are solved by choosing a different scaling. (a) Data for the Bi(111) from the top row, first column of Fig. \ref{['fig:3']}. The dashed lines indicate the paths for the photoemission intensity cuts in panels e and f. (b) Reconstruction of the data in panel a. (c) Zoomed-in version of the data. (d) Corresponding reconstruction. (e) Photoemission intensity along the vertical dashed line in panel a for original data, the reconstruction from the full-scale image and the reconstruction from the zoomed-in image. (f) Corresponding intensity curves for the horizontal dashed line in panel a.
Figure 5: Model for testing the suitability of the compressed data for clustering. (a)-(e) Five slightly different ARPES spectra from Bi$_2$Se$_3$Bianchi:2010ab. The spectra are taken from the reference dataset and have a high $S/N$. (f)-(i) Noisy spectra derived from the original data in panel c assuming a different total number of counts $n_I$ per image; these are displayed in greyscale, which makes it easier to see structures for low $n_I$. (j) Ground truth for clustering; each coloured stripe has 500 pixels that correspond to 500 different noisy spectra, all derived from one of the five reference spectra. A successful $k$-means clustering must produced a permutation of these coloured stripes.
...and 3 more figures

An autoencoder for compressing angle-resolved photoemission spectroscopy data

TL;DR

Abstract

An autoencoder for compressing angle-resolved photoemission spectroscopy data

Authors

TL;DR

Abstract

Table of Contents

Figures (8)