DyCE: Dynamically Configurable Exiting for Deep Learning Compression and Real-time Scaling
Qingyuan Wang, Barry Cardiff, Antoine Frappé, Benoit Larras, Deepu John
TL;DR
DyCE tackles the problem of static deep learning models that cannot adapt to varying real-time demands. It introduces dynamically configurable exiting by attaching lightweight exit networks at intermediate backbone points, enabling runtime trade-offs between accuracy and compute without re-deploying models. The approach includes a decoupled exit design, training procedures with soft distillation, and search algorithms (iterative and single-pass) to generate configuration sets that meet target performance constraints. Empirical results on ImageNet show meaningful MAC reductions (≈23–26%) with minimal accuracy loss (<0.5%), and DyCE demonstrates strong potential for real-time scaling and broad applicability beyond image classification.
Abstract
Conventional deep learning (DL) model compression and scaling methods focus on altering the model's components, impacting the results across all samples uniformly. However, since samples vary in difficulty, a dynamic model that adapts computation based on sample complexity offers a novel perspective for compression and scaling. Despite this potential, existing dynamic models are typically monolithic and model-specific, limiting their generalizability as broad compression and scaling methods. Additionally, most deployed DL systems are fixed, unable to adjust their scale once deployed and, therefore, cannot adapt to the varying real-time demands. This paper introduces DyCE, a dynamically configurable system that can adjust the performance-complexity trade-off of a DL model at runtime without requiring re-initialization or redeployment on inference hardware. DyCE achieves this by adding small exit networks to intermediate layers of the original model, allowing computation to terminate early if acceptable results are obtained. DyCE also decouples the design of an efficient dynamic model, facilitating easy adaptation to new base models and potential general use in compression and scaling. We also propose methods for generating optimized configurations and determining the types and positions of exit networks to achieve desired performance and complexity trade-offs. By enabling simple configuration switching, DyCE provides fine-grained performance tuning in real-time. We demonstrate the effectiveness of DyCE through image classification tasks using deep convolutional neural networks (CNNs). DyCE significantly reduces computational complexity by 23.5% for ResNet152 and 25.9% for ConvNextv2-tiny on ImageNet, with accuracy reductions of less than 0.5%.
