TAPAS: Datasets for Learning the Learning with Errors Problem
Eshika Saxena, Alberto Alfarano, François Charton, Emily Wenger, Kristin Lauter
TL;DR
The paper presents TAPAS, a collection of five large, preprocessed LWE datasets designed for off-the-shelf AI cryptanalysis research. It details a data-generation pipeline that combines subsampling and lattice-reduction techniques to produce millions of reduced LWE samples across diverse parameter settings, and it establishes baseline performance using SALSA and Cool & Cruel attacks. By providing extensive data, hardware- and software-agnostic preprocessing, and explicit cost metrics, TAPAS aims to accelerate AI-driven exploration of LWE security and pave the way for scaling laws and novel cryptanalytic methods. The work highlights both the potential of AI in cryptanalysis and the practical limits imposed by lattice-reduction quality and computational requirements, offering clear directions for future research and dataset expansion.
Abstract
AI-powered attacks on Learning with Errors (LWE), an important hard math problem in post-quantum cryptography, rival or outperform "classical" attacks on LWE under certain parameter settings. Despite the promise of this approach, a dearth of accessible data limits AI practitioners' ability to study and improve these attacks. Creating LWE data for AI model training is time- and compute-intensive and requires significant domain expertise. To fill this gap and accelerate AI research on LWE attacks, we propose the TAPAS datasets, a Toolkit for Analysis of Post-quantum cryptography using AI Systems. These datasets cover several LWE settings and can be used off-the-shelf by AI practitioners to prototype new approaches to cracking LWE. This work documents TAPAS dataset creation, establishes attack performance baselines, and lays out directions for future work.
