Table of Contents
Fetching ...

Through the telecom lens: Are all training samples important?

Shruti Bothe, Illyyne Saffar, Aurelie Boisbunon, Hasan Farooq, Julien Forgeat, Md Moin Uddin Chowdhury

TL;DR

This work tackles whether all training samples are equally valuable in telecom model training, where data are noisy, high-dimensional, and energy considerations are critical. It introduces a gradient-norm–based sample-importance framework that computes per-sample gradients $g_{e,s}$ across epochs and aggregates them into an importance score $\mathcal{I}(s)$. Empirical results on three telecom datasets show that training on the top $p\%$ of important samples can match full-data baselines while using substantially less data and compute, yielding notable energy-emission reductions. The approach is lightweight and model-agnostic, offering practical pathways to sustainable, efficient AI in telecom and suggesting avenues for dynamic curricula and broader benchmarking in future work.

Abstract

The rise of AI in telecommunications, from optimizing Radio Access Networks to managing user experience, has sharply increased data volumes and training demands. Telecom data is often noisy, high-dimensional, costly to store, process, and label. Despite Ai's critical role, standard workflows still assume all training samples contribute equally. On the other hand, next generation systems require AI models that are accurate, efficient, and sustainable.The paper questions the assumptions of equal importance by focusing on applying and analyzing the roles of individual samples in telecom training and assessing whether the proposed model optimizes computation and energy use. we perform sample-level gradient analysis across epochs to identify patterns of influence and redundancy in model learning. Based on this, we propose a sample importance framework thats electively prioritizes impactful data and reduces computation without compromising accuracy. Experiments on three real-world telecom datasets show that our method [reserves performance while reducing data needs and computational overhead while advancing the goals of sustainable AI in telecommunications.

Through the telecom lens: Are all training samples important?

TL;DR

This work tackles whether all training samples are equally valuable in telecom model training, where data are noisy, high-dimensional, and energy considerations are critical. It introduces a gradient-norm–based sample-importance framework that computes per-sample gradients across epochs and aggregates them into an importance score . Empirical results on three telecom datasets show that training on the top of important samples can match full-data baselines while using substantially less data and compute, yielding notable energy-emission reductions. The approach is lightweight and model-agnostic, offering practical pathways to sustainable, efficient AI in telecom and suggesting avenues for dynamic curricula and broader benchmarking in future work.

Abstract

The rise of AI in telecommunications, from optimizing Radio Access Networks to managing user experience, has sharply increased data volumes and training demands. Telecom data is often noisy, high-dimensional, costly to store, process, and label. Despite Ai's critical role, standard workflows still assume all training samples contribute equally. On the other hand, next generation systems require AI models that are accurate, efficient, and sustainable.The paper questions the assumptions of equal importance by focusing on applying and analyzing the roles of individual samples in telecom training and assessing whether the proposed model optimizes computation and energy use. we perform sample-level gradient analysis across epochs to identify patterns of influence and redundancy in model learning. Based on this, we propose a sample importance framework thats electively prioritizes impactful data and reduces computation without compromising accuracy. Experiments on three real-world telecom datasets show that our method [reserves performance while reducing data needs and computational overhead while advancing the goals of sustainable AI in telecommunications.

Paper Structure

This paper contains 9 sections, 3 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Snapshot of Internet Activity and Energy Consumption of a Base-station
  • Figure 2: Test predictions comparing the baseline model and the sample important model
  • Figure 3: Model performance of sample importance framework as compared to baseline models.