Challenges and Solutions in Selecting Optimal Lossless Data Compression Algorithms
Md. Atiqur Rahman, MM Fazle Rabbi
TL;DR
This work tackles the problem of selecting lossless data compression algorithms when multiple performance metrics matter. It introduces a mathematical framework that combines compression ratio $r$, encoding time $e$, and decoding time $d$ into a unified score $p$ via normalized averages $a_{(x,j)}$, normalized scores $c_{(x,j)}$, and balancing constants $k_r$, $k_e$, and $k_d$, culminating in the grand total performance $g_j^{r,e,d}$. The approach also provides an algorithm to identify the optimal compressor under any two- or three-metric combination. Empirical results on image and text datasets show that while learning-based codecs often achieve higher $r$, classical codecs can be superior when speed is prioritized, and the framework reliably guides algorithm selection across diverse priority settings. Overall, the framework offers a robust, adaptable decision-support tool that bridges theoretical multi-criteria optimization with practical compression selection.
Abstract
The rapid growth of digital data has heightened the demand for efficient lossless compression methods. However, existing algorithms exhibit trade-offs: some achieve high compression ratios, others excel in encoding or decoding speed, and none consistently perform best across all dimensions. This mismatch complicates algorithm selection for applications where multiple performance metrics are simultaneously critical, such as medical imaging, which requires both compact storage and fast retrieval. To address this challenge, we present a mathematical framework that integrates compression ratio, encoding time, and decoding time into a unified performance score. The model normalizes and balances these metrics through a principled weighting scheme, enabling objective and fair comparisons among diverse algorithms. Extensive experiments on image and text datasets validate the approach, showing that it reliably identifies the most suitable compressor for different priority settings. Results also reveal that while modern learning-based codecs often provide superior compression ratios, classical algorithms remain advantageous when speed is paramount. The proposed framework offers a robust and adaptable decision-support tool for selecting optimal lossless data compression techniques, bridging theoretical measures with practical application needs.
