Table of Contents
Fetching ...

MACK: Mismodeling Addressed with Contrastive Knowledge

Liam Rankin Sheldon, Dylan Sheldon Rankin, Philip Harris

TL;DR

MACK addresses the problem of mismodeling between simulated and real data in high-energy physics by using a contrastive learning framework. It trains a siamese network with a featurizer and a projector under a VICReg loss on paired nominal (simulation) and alternate (data-like) samples, using the Energy Mover's Distance to form positive pairs, followed by training a downstream classifier on the nominal representations. The approach reduces the sensitivity of model performance to dataset differences across two jet-tagging tasks (realistic Z′→qq̄ vs QCD and JetNet), though there is a trade-off with nominal peak performance that can be mitigated via controlled fine-tuning. The results suggest MACK yields more stable models suitable for robust analyses and potentially broader applications beyond jet tagging, such as anomaly detection and new physics searches.

Abstract

The use of machine learning methods in high energy physics typically relies on large volumes of precise simulation for training. As machine learning models become more complex they can become increasingly sensitive to differences between this simulation and the real data collected by experiments. We present a generic methodology based on contrastive learning which is able to greatly mitigate this negative effect. Crucially, the method does not require prior knowledge of the specifics of the mismodeling. While we demonstrate the efficacy of this technique using the task of jet-tagging at the Large Hadron Collider, it is applicable to a wide array of different tasks both in and out of the field of high energy physics.

MACK: Mismodeling Addressed with Contrastive Knowledge

TL;DR

MACK addresses the problem of mismodeling between simulated and real data in high-energy physics by using a contrastive learning framework. It trains a siamese network with a featurizer and a projector under a VICReg loss on paired nominal (simulation) and alternate (data-like) samples, using the Energy Mover's Distance to form positive pairs, followed by training a downstream classifier on the nominal representations. The approach reduces the sensitivity of model performance to dataset differences across two jet-tagging tasks (realistic Z′→qq̄ vs QCD and JetNet), though there is a trade-off with nominal peak performance that can be mitigated via controlled fine-tuning. The results suggest MACK yields more stable models suitable for robust analyses and potentially broader applications beyond jet tagging, such as anomaly detection and new physics searches.

Abstract

The use of machine learning methods in high energy physics typically relies on large volumes of precise simulation for training. As machine learning models become more complex they can become increasingly sensitive to differences between this simulation and the real data collected by experiments. We present a generic methodology based on contrastive learning which is able to greatly mitigate this negative effect. Crucially, the method does not require prior knowledge of the specifics of the mismodeling. While we demonstrate the efficacy of this technique using the task of jet-tagging at the Large Hadron Collider, it is applicable to a wide array of different tasks both in and out of the field of high energy physics.

Paper Structure

This paper contains 12 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Sketch of the MACK method. Samples from the nominal and alternate datasets are paired such that the EMD between each pair is less than some cutoff $C$. These pairs are used to train a featurizer and a projector network without labels using the VICReg loss ($\mathscr{L}_\textrm{VICReg}$) applied to the outputs of the projectors ($P$ and $P'$). The outputs of the featurizer when applied to the nominal dataset ($L$) are then used, along with labels, to train a desired supervised network.
  • Figure 2: ROC curves for the supervised model and MACK with different levels of fine-tuning and augmentations, evaluated on the realistic datasets ($Z^\prime\rightarrow q\bar{q}$ vs. QCD). Left: Comparison of different contrastive models using EMD-based pairings and/or particle augmentations. A clear improvement is seen with EMD-based pairing (blue) over augmentations alone (orange or green), with a small additional improvement from the combination of EMD-based pairing and augmentations (red). Right: Comparison of the supervised model (blue) and MACK with different levels of fine-tuning (orange, green, red), evaluated on the realistic datasets. Even small amounts of fine-tuning are capable of producing MACK models with performance similar to or better than a supervised model.
  • Figure 3: Comparisons of the performance for the supervised model (blue) and MACK with different levels of fine-tuning (orange, green, red), evaluated on the realistic datasets. MACK with no fine-tuning shows a fractional change half that of a supervised model. Left: Fractional change in QCD efficiency ($\Delta\epsilon_{QCD}/\epsilon_{QCD}$) at a fixed $Z'$ efficiency. Right: Fractional change in $Z'$ efficiency ($\Delta\epsilon_{Z'}/\epsilon_{Z'}$) at a fixed QCD efficiency.
  • Figure 4: Comparisons of the performance for the supervised model and MACK with different levels of fine-tuning, evaluated on the JetNet dataset. Left: $q(g)$ efficiency as a function of $W(Z)$ efficiency evaluated on the nominal (solid) and alternate (dashed) datasets. The best performance on the alternate dataset is achieved with MACK and on the nominal dataset with MACK after full fine-tuning. Right: Fractional change in $Z'$ efficiency ($\Delta\epsilon_{Z'}/\epsilon_{Z'}$) at a fixed QCD efficiency. Each fine-tuning model is a distinct training starting from the base MACK model. The general trend from fine-tuning for 1 epoch to full fine-tuning is as expected with full fine-tuning achieving the best performance on the nominal dataset and the worst performance on the alternate dataset.