Maintaining Performance with Less Data
Dominic Sanderson, Tatiana Kalgonova
TL;DR
The paper tackles the high cost and environmental impact of training image classifiers by introducing three dynamic data-use methods to reduce input data during training. It evaluates these methods—Data Step, Data Increment, and Data Cut—across MNIST, CIFAR-10, and smallNORB using a CNN-Capsule architecture and reports up to $50\%$ runtime reductions with dataset-dependent effects on accuracy. The findings show that targeted data reductions can maintain or even improve performance in some cases (notably MNIST and certain smallNORB runs) while random or excessive reductions can degrade accuracy, especially on CIFAR-10. The work highlights the potential for substantial efficiency gains in AI training and points to future research on principled data selection to optimize performance-cost tradeoffs.
Abstract
We propose a novel method for training a neural network for image classification to reduce input data dynamically, in order to reduce the costs of training a neural network model. As Deep Learning tasks become more popular, their computational complexity increases, leading to more intricate algorithms and models which have longer runtimes and require more input data. The result is a greater cost on time, hardware, and environmental resources. By using data reduction techniques, we reduce the amount of work performed, and therefore the environmental impact of AI techniques, and with dynamic data reduction we show that accuracy may be maintained while reducing runtime by up to 50%, and reducing carbon emission proportionally.
