Toward Storage-Aware Learning with Compressed Data An Empirical Exploratory Study on JPEG
Kichang Lee, Songkuk Kim, JaeYeon Park, JeongGil Ko
TL;DR
This work tackles storage-constrained on-device learning by formalizing and empirically investigating the joint data-quantity and data-quality trade-off under a fixed budget, using JPEG compression on CIFAR-10. It shows that naive uniform strategies are suboptimal and that different data samples have varying sensitivity to compression, motivating a sample-wise adaptive compression approach. The authors provide an actionable framework and discuss lightweight proxies for optimizing per-sample fidelity under a budget, laying groundwork for storage-aware learning systems. The findings have practical implications for deploying robust, personalized on-device models under real-world storage constraints and offer avenues to integrate with continual, federated, and active learning paradigms.
Abstract
On-device machine learning is often constrained by limited storage, particularly in continuous data collection scenarios. This paper presents an empirical study on storage-aware learning, focusing on the trade-off between data quantity and quality via compression. We demonstrate that naive strategies, such as uniform data dropping or one-size-fits-all compression, are suboptimal. Our findings further reveal that data samples exhibit varying sensitivities to compression, supporting the feasibility of a sample-wise adaptive compression strategy. These insights provide a foundation for developing a new class of storage-aware learning systems. The primary contribution of this work is the systematic characterization of this under-explored challenge, offering valuable insights that advance the understanding of storage-aware learning.
