Table of Contents
Fetching ...

The Latent Information Geometry of Jet Classification

Rebecca Maria Kuntz, Tilman Plehn, Björn Malte Schäfer, Benedikt Schosser, Sophia Vent

TL;DR

The main concepts needed to analyze learned latent geometries, specifically curvature and nonmetricities, are introduced, and how they can be used for decoder and classifier geometries are shown.

Abstract

Latent representations are an important theme in modern machine learning. Any network training with the notion of locality introduces a latent geometry which we can analyze with the help of differential geometry, specifically information geometry. We introduce the main concepts needed to analyze learned latent geometries, specifically curvature and nonmetricities, and show how they can be used for decoder and classifier geometries. We then apply our new methods to understand the physics behind binary quark-gluon classification and three-fold fat jet tagging.

The Latent Information Geometry of Jet Classification

TL;DR

The main concepts needed to analyze learned latent geometries, specifically curvature and nonmetricities, are introduced, and how they can be used for decoder and classifier geometries are shown.

Abstract

Latent representations are an important theme in modern machine learning. Any network training with the notion of locality introduces a latent geometry which we can analyze with the help of differential geometry, specifically information geometry. We introduce the main concepts needed to analyze learned latent geometries, specifically curvature and nonmetricities, and show how they can be used for decoder and classifier geometries. We then apply our new methods to understand the physics behind binary quark-gluon classification and three-fold fat jet tagging.
Paper Structure (10 sections, 56 equations, 20 figures, 4 tables)

This paper contains 10 sections, 56 equations, 20 figures, 4 tables.

Figures (20)

  • Figure 1: Left: Riemann curvature changes the orientation of a vector when it is parallel-transported in a loop. Right: the Amari-Chentsov tensor causes length and angular defects of vectors under parallel transport.
  • Figure 2: Illustration of parallel transport with the $(\pm 1)$- and LC-connection, respectively. Adapted from Ref. Nielsen:2022mfg.
  • Figure 3: Conceptual overview of our network setup. The decoder and classifier are both attached to the latent space $z$ and each induce a geometry on it.
  • Figure 4: Upper: test data and Frobenius norm of the Fisher information for a binary classifier (left) and a three-label classification (right). Lower: test data and Frobenius norm of the Fisher information for four labels (left) and Fisher ellipses for the three-label case (right).
  • Figure 5: Left: test data and Frobenius norm for the simplified classification of 1 vs. 7. Right: feature based on the size of the directional derivative of the decoder in the direction of the dominant eigenvector of the classifier metric.
  • ...and 15 more figures