Improving the performance of weak supervision searches using data augmentation

Zong-En Chen; Cheng-Wei Chiang; Feng-Yang Hsieh

Improving the performance of weak supervision searches using data augmentation

Zong-En Chen, Cheng-Wei Chiang, Feng-Yang Hsieh

TL;DR

This work addresses the data-efficiency challenge in weakly supervised collider searches by introducing physics-inspired data augmentation to the CWoLa framework. By applying $p_T$ smearing and jet rotation (and their combination) to jet images, the authors significantly reduce the learning threshold from about $6\sigma$ to $\sim3\sigma$, enabling more robust discrimination between signal and background with substantially fewer labeled events. The study leverages a Hidden Valley benchmark with Z' mediation to demonstrate that EN normalization effectively mitigates sculpting, and that the combined augmentation provides the strongest gains, even under moderate systematic uncertainties. Overall, physics-informed data augmentation emerges as a practical, data-efficient tool to enhance weakly supervised learning in collider searches, with potential applicability beyond the specific HV scenario.

Abstract

Weak supervision combines the advantages of training on real data with the ability to exploit signal properties. However, training a neural network using weak supervision often requires an excessive amount of signal data, which severely limits its practical applicability. In this study, we propose addressing this limitation through data augmentation, increasing the training data's size and diversity. Specifically, we focus on physics-inspired data augmentation methods, such as $p_{\text{T}}$ smearing and jet rotation. Our results demonstrate that data augmentation can significantly enhance the performance of weak supervision, enabling neural networks to learn efficiently from substantially less data.

Improving the performance of weak supervision searches using data augmentation

TL;DR

This work addresses the data-efficiency challenge in weakly supervised collider searches by introducing physics-inspired data augmentation to the CWoLa framework. By applying

smearing and jet rotation (and their combination) to jet images, the authors significantly reduce the learning threshold from about

, enabling more robust discrimination between signal and background with substantially fewer labeled events. The study leverages a Hidden Valley benchmark with Z' mediation to demonstrate that EN normalization effectively mitigates sculpting, and that the combined augmentation provides the strongest gains, even under moderate systematic uncertainties. Overall, physics-informed data augmentation emerges as a practical, data-efficient tool to enhance weakly supervised learning in collider searches, with potential applicability beyond the specific HV scenario.

Abstract

smearing and jet rotation. Our results demonstrate that data augmentation can significantly enhance the performance of weak supervision, enabling neural networks to learn efficiently from substantially less data.

Improving the performance of weak supervision searches using data augmentation

TL;DR

Abstract

Improving the performance of weak supervision searches using data augmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)