Doctor: Optimizing Container Rebuild Efficiency by Instruction Re-Orchestration
Zhiling Zhu, Tieming Chen, Chengwei Liu, Han Liu, Qijie Song, Zhengzi Xu, Yang Liu
TL;DR
Doctor tackles the persistent problem of slow Dockerfile rebuilds by reordering instructions in a dependency-aware fashion that accounts for future modifications. The method combines dependency extraction, modification-frequency prediction from historical data, precise per-instruction build-time measurements, and a weighted topological sort to produce optimized Dockerfile sequences. Across 2,000 popular repositories, Doctor achieves an average rebuild-time reduction of $26.5\%$, with 12.82\% of files exceeding a $50\%$ improvement, while preserving functional similarity in the vast majority of cases. This work delivers actionable patterns for Dockerfile management and provides open data and tools to support long-term maintenance and optimization of container build pipelines.
Abstract
Containerization has revolutionized software deployment, with Docker leading the way due to its ease of use and consistent runtime environment. As Docker usage grows, optimizing Dockerfile performance, particularly by reducing rebuild time, has become essential for maintaining efficient CI/CD pipelines. However, existing optimization approaches primarily address single builds without considering the recurring rebuild costs associated with modifications and evolution, limiting long-term efficiency gains. To bridge this gap, we present Doctor, a method for improving Dockerfile build efficiency through instruction re-ordering that addresses key challenges: identifying instruction dependencies, predicting future modifications, ensuring behavioral equivalence, and managing the optimization computational complexity. We developed a comprehensive dependency taxonomy based on Dockerfile syntax and a historical modification analysis to prioritize frequently modified instructions. Using a weighted topological sorting algorithm, Doctor optimizes instruction order to minimize future rebuild time while maintaining functionality. Experiments on 2,000 GitHub repositories show that Doctor improves 92.75% of Dockerfiles, reducing rebuild time by an average of 26.5%, with 12.82% of files achieving over a 50% reduction. Notably, 86.2% of cases preserve functional similarity. These findings highlight best practices for Dockerfile management, enabling developers to enhance Docker efficiency through informed optimization strategies.
