Through the telecom lens: Are all training samples important?
Shruti Bothe, Illyyne Saffar, Aurelie Boisbunon, Hasan Farooq, Julien Forgeat, Md Moin Uddin Chowdhury
TL;DR
This work tackles whether all training samples are equally valuable in telecom model training, where data are noisy, high-dimensional, and energy considerations are critical. It introduces a gradient-norm–based sample-importance framework that computes per-sample gradients $g_{e,s}$ across epochs and aggregates them into an importance score $\mathcal{I}(s)$. Empirical results on three telecom datasets show that training on the top $p\%$ of important samples can match full-data baselines while using substantially less data and compute, yielding notable energy-emission reductions. The approach is lightweight and model-agnostic, offering practical pathways to sustainable, efficient AI in telecom and suggesting avenues for dynamic curricula and broader benchmarking in future work.
Abstract
The rise of AI in telecommunications, from optimizing Radio Access Networks to managing user experience, has sharply increased data volumes and training demands. Telecom data is often noisy, high-dimensional, costly to store, process, and label. Despite Ai's critical role, standard workflows still assume all training samples contribute equally. On the other hand, next generation systems require AI models that are accurate, efficient, and sustainable.The paper questions the assumptions of equal importance by focusing on applying and analyzing the roles of individual samples in telecom training and assessing whether the proposed model optimizes computation and energy use. we perform sample-level gradient analysis across epochs to identify patterns of influence and redundancy in model learning. Based on this, we propose a sample importance framework thats electively prioritizes impactful data and reduces computation without compromising accuracy. Experiments on three real-world telecom datasets show that our method [reserves performance while reducing data needs and computational overhead while advancing the goals of sustainable AI in telecommunications.
