Table of Contents
Fetching ...

Increasing Information Extraction in Low-Signal Regimes via Multiple Instance Learning

Atakan Azakli, Bernd Stelzer

TL;DR

Low-signal hypothesis testing in SMEFT contexts often underperforms with single-instance ML. We propose MIL as an information-theoretic framework that aggregates events into bags to boost discriminative signal and derive how bag-level information boosts effective Fisher information. The paper presents theory, a practical calibration for Bartlett identity violations, and comprehensive experiments (binary, multi-class, and parameterized nets) demonstrating MIL's resilience and FI gains. Limitations include simplified data and i.i.d. assumptions, with future work on σ_ε(N_B) modeling and MIL-architecture design to maximize set-level sufficiency.

Abstract

In this work, we introduce a new information-theoretic perspective on Multiple Instance Learning (MIL) for parameter estimation with i.i.d. data, and show that MIL can outperform single-instance learners in low-signal regimes. Prior work [Nachman and Thaler, 2021] argued that single-instance methods are often sufficient, but this conclusion presumes enough single-instance signal to train near-optimal classifiers. We demonstrate that even state-of-the-art single-instance models can fail to reach optimal classifier performance in challenging low-signal regimes, whereas MIL can mitigate this sub-optimality. As a concrete application, we constrain Wilson coefficients of the Standard Model Effective Field Theory (SMEFT) using kinematic information from subatomic particle collision events at the Large Hadron Collider (LHC). In experiments, we observe that under specific modeling and weak signal conditions, pooling instances can increase the effective Fisher information compared to single-instance approaches.

Increasing Information Extraction in Low-Signal Regimes via Multiple Instance Learning

TL;DR

Low-signal hypothesis testing in SMEFT contexts often underperforms with single-instance ML. We propose MIL as an information-theoretic framework that aggregates events into bags to boost discriminative signal and derive how bag-level information boosts effective Fisher information. The paper presents theory, a practical calibration for Bartlett identity violations, and comprehensive experiments (binary, multi-class, and parameterized nets) demonstrating MIL's resilience and FI gains. Limitations include simplified data and i.i.d. assumptions, with future work on σ_ε(N_B) modeling and MIL-architecture design to maximize set-level sufficiency.

Abstract

In this work, we introduce a new information-theoretic perspective on Multiple Instance Learning (MIL) for parameter estimation with i.i.d. data, and show that MIL can outperform single-instance learners in low-signal regimes. Prior work [Nachman and Thaler, 2021] argued that single-instance methods are often sufficient, but this conclusion presumes enough single-instance signal to train near-optimal classifiers. We demonstrate that even state-of-the-art single-instance models can fail to reach optimal classifier performance in challenging low-signal regimes, whereas MIL can mitigate this sub-optimality. As a concrete application, we constrain Wilson coefficients of the Standard Model Effective Field Theory (SMEFT) using kinematic information from subatomic particle collision events at the Large Hadron Collider (LHC). In experiments, we observe that under specific modeling and weak signal conditions, pooling instances can increase the effective Fisher information compared to single-instance approaches.

Paper Structure

This paper contains 37 sections, 37 equations, 24 figures, 3 tables.

Figures (24)

  • Figure 1: Receiver Operating Characteristic (ROC) curves for binary classification of SMEFT ($c_{HW}=0.1$) vs. SM with different levels of background event contamination with respect to number of signal events in the bag. Additional contamination levels are shown in Figure \ref{['fig:MIL_vs_MLP_big_version']}.
  • Figure 2: The increase in effective Fisher Information with respect to bag size. Since different 1000 event chunks contain different levels of Fisher Information, the $1\sigma$ variation of information contained in different bags is also showcased with the bars.
  • Figure 3: Inconsistent and unphysical predictions of Parameterized Neural Networks.
  • Figure 4: Distributions of the ensemble classifier output for event-by-event ($N_B=1$, left) and set-based ($N_B=250$, right) classification. Larger versions of these plots are shown in Figures \ref{['fig:hist_analysis_big_bag1']} and \ref{['fig:hist_analysis_big_bag250']}.
  • Figure 5: Multi-class classifier: LLR values, and the parabolic fits for the same 1000-event pseudo-experiment.
  • ...and 19 more figures