Table of Contents
Fetching ...

DNN Modularization via Activation-Driven Training

Tuan Ngo, Abid Hassan, Saad Shafiq, Nenad Medvidovic

TL;DR

MODA tackles the challenge of retraining costs and entanglement in monolithic DNNs by inducing inherent modularity through activation-driven training. It simultaneously optimizes for intra-class affinity, inter-class dispersion, and compactness, followed by a structured, frequency-based decomposition to extract per-class modules. Empirically, MODA achieves comparable accuracy to standard training while producing substantially smaller, less overlapping modules and reducing training time relative to competing methods; its replacement and reuse capabilities enable efficient adaptation to new tasks without full retraining. This activation-centric, mask-free approach offers practical benefits for model reuse, modular updates, and scalable deployment across diverse CNN architectures and datasets.

Abstract

Deep Neural Networks (DNNs) tend to accrue technical debt and suffer from significant retraining costs when adapting to evolving requirements. Modularizing DNNs offers the promise of improving their reusability. Previous work has proposed techniques to decompose DNN models into modules both during and after training. However, these strategies yield several shortcomings, including significant weight overlaps and accuracy losses across modules, restricted focus on convolutional layers only, and added complexity and training time by introducing auxiliary masks to control modularity. In this work, we propose MODA, an activation-driven modular training approach. MODA promotes inherent modularity within a DNN model by directly regulating the activation outputs of its layers based on three modular objectives: intra-class affinity, inter-class dispersion, and compactness. MODA is evaluated using three well-known DNN models and five datasets with varying sizes. This evaluation indicates that, compared to the existing state-of-the-art, using MODA yields several advantages: (1) MODA accomplishes modularization with 22% less training time; (2) the resultant modules generated by MODA comprise up to 24x fewer weights and 37x less weight overlap while (3) preserving the original model's accuracy without additional fine-tuning; in module replacement scenarios, (4) MODA improves the accuracy of a target class by 12% on average while ensuring minimal impact on the accuracy of other classes.

DNN Modularization via Activation-Driven Training

TL;DR

MODA tackles the challenge of retraining costs and entanglement in monolithic DNNs by inducing inherent modularity through activation-driven training. It simultaneously optimizes for intra-class affinity, inter-class dispersion, and compactness, followed by a structured, frequency-based decomposition to extract per-class modules. Empirically, MODA achieves comparable accuracy to standard training while producing substantially smaller, less overlapping modules and reducing training time relative to competing methods; its replacement and reuse capabilities enable efficient adaptation to new tasks without full retraining. This activation-centric, mask-free approach offers practical benefits for model reuse, modular updates, and scalable deployment across diverse CNN architectures and datasets.

Abstract

Deep Neural Networks (DNNs) tend to accrue technical debt and suffer from significant retraining costs when adapting to evolving requirements. Modularizing DNNs offers the promise of improving their reusability. Previous work has proposed techniques to decompose DNN models into modules both during and after training. However, these strategies yield several shortcomings, including significant weight overlaps and accuracy losses across modules, restricted focus on convolutional layers only, and added complexity and training time by introducing auxiliary masks to control modularity. In this work, we propose MODA, an activation-driven modular training approach. MODA promotes inherent modularity within a DNN model by directly regulating the activation outputs of its layers based on three modular objectives: intra-class affinity, inter-class dispersion, and compactness. MODA is evaluated using three well-known DNN models and five datasets with varying sizes. This evaluation indicates that, compared to the existing state-of-the-art, using MODA yields several advantages: (1) MODA accomplishes modularization with 22% less training time; (2) the resultant modules generated by MODA comprise up to 24x fewer weights and 37x less weight overlap while (3) preserving the original model's accuracy without additional fine-tuning; in module replacement scenarios, (4) MODA improves the accuracy of a target class by 12% on average while ensuring minimal impact on the accuracy of other classes.

Paper Structure

This paper contains 14 sections, 10 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: High-level overview of MODA. As depicted in the right-hand segment (2), the number of model's hidden units shared across modules decreases from earlier to later layers.
  • Figure 2: MODA's module replacement strategy
  • Figure 3: (RQ1) Measurements of module reuse across different sub-task types for the VGG16 model on CIFAR datasets
  • Figure 4: (RQ3) Impact of threshold $\tau$ on modularization