Table of Contents
Fetching ...

Differentiable Earth Mover's Distance for Data Compression at the High-Luminosity LHC

Rohan Shenoy, Javier Duarte, Christian Herwig, James Hirschauer, Daniel Noonan, Maurizio Pierini, Nhan Tran, Cristina Mantilla Suarez

TL;DR

A convolutional neural network is trained to learn a differentiable, fast approximation of the Earth mover’s distance and it is demonstrated that it can be used as a substitute for computing-intensive EMD implementations.

Abstract

The Earth mover's distance (EMD) is a useful metric for image recognition and classification, but its usual implementations are not differentiable or too slow to be used as a loss function for training other algorithms via gradient descent. In this paper, we train a convolutional neural network (CNN) to learn a differentiable, fast approximation of the EMD and demonstrate that it can be used as a substitute for computing-intensive EMD implementations. We apply this differentiable approximation in the training of an autoencoder-inspired neural network (encoder NN) for data compression at the high-luminosity LHC at CERN. The goal of this encoder NN is to compress the data while preserving the information related to the distribution of energy deposits in particle detectors. We demonstrate that the performance of our encoder NN trained using the differentiable EMD CNN surpasses that of training with loss functions based on mean squared error.

Differentiable Earth Mover's Distance for Data Compression at the High-Luminosity LHC

TL;DR

A convolutional neural network is trained to learn a differentiable, fast approximation of the Earth mover’s distance and it is demonstrated that it can be used as a substitute for computing-intensive EMD implementations.

Abstract

The Earth mover's distance (EMD) is a useful metric for image recognition and classification, but its usual implementations are not differentiable or too slow to be used as a loss function for training other algorithms via gradient descent. In this paper, we train a convolutional neural network (CNN) to learn a differentiable, fast approximation of the EMD and demonstrate that it can be used as a substitute for computing-intensive EMD implementations. We apply this differentiable approximation in the training of an autoencoder-inspired neural network (encoder NN) for data compression at the high-luminosity LHC at CERN. The goal of this encoder NN is to compress the data while preserving the information related to the distribution of energy deposits in particle detectors. We demonstrate that the performance of our encoder NN trained using the differentiable EMD CNN surpasses that of training with loss functions based on mean squared error.
Paper Structure (12 sections, 5 equations, 7 figures, 2 tables)

This paper contains 12 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Conceptual overview of the HGCAL trigger path for the autoencoder. We read a 7 bit input for each of the 48 trigger cells, and compress it to a latent representation of 16$\times$3 bits. The decoder mirrors the encoder to produce the original 48 input image.
  • Figure 2: Illustration of the telescope loss. The red lines correspond to one example of a 2$\times$2 trigger cell grouping in the $L_{2\times2}$ loss term, while the blue lines correspond to the three 4$\times$4 trigger cell groupings in the $L_{4\times4}$ loss term for telescope loss.
  • Figure 3: Remappings of the 48 trigger cell charge fractions into encoding (A): an 8$\times$8$\times$1 tensor with 16 empty cells, and encoding (B): a 4$\times$4$\times$3 tensor.
  • Figure 4: Architecture of our EMD CNN. We remap hexagonal HGCAL wafers to a more regular 4$\times$4$\times$3 tensor, and take in pairs of these remapped wafers as our input. The neural network has multiple 2D convolutional layers, each followed by a batch normalization layer and ReLU activation. The network then feeds into a single fully-connected layer, followed by a batch normalization layer and ReLU activation. We then average over both input orders to enforce the symmetry in the EMD metric.
  • Figure 5: Performance of the EMD CNN with optimized hyperparameters. Distribution of the relative difference between the EMD CNN prediction and the true EMD (left) Predicted EMD as a function of true EMD (right). The optimized hyperparameters correspond to 32 convolutional filters, kernel size of 5, 4 2D convolutional layers, 1 fully-connected layer with 256 neurons, and MSE loss function.
  • ...and 2 more figures