Table of Contents
Fetching ...

Knowledge Translation: A New Pathway for Model Compression

Wujie Sun, Defang Chen, Jiawei Chen, Yan Feng, Chun Chen, Can Wang

TL;DR

Knowledge Translation proposes translating large-block parameters into small-block parameters without retraining, enabling flexible, architecture-agnostic compression. The approach builds translation datasets via data generation, enhances them with random masking and noise augmentation, and trains a translator model (favoring MLP-Mixer) to minimize a translation loss $L$ between predicted and target parameters. In MNIST experiments, KT substantially outperforms baselines like random initialization or greedy replacement, improves with longer training and augmentation, and shows potential across architectures and datasets. This framework advances Green AI by reducing resource overhead while preserving functionality, and outlines concrete directions for scalable, dataset-efficient translation architectures and broader applications.

Abstract

Deep learning has witnessed significant advancements in recent years at the cost of increasing training, inference, and model storage overhead. While existing model compression methods strive to reduce the number of model parameters while maintaining high accuracy, they inevitably necessitate the re-training of the compressed model or impose architectural constraints. To overcome these limitations, this paper presents a novel framework, termed \textbf{K}nowledge \textbf{T}ranslation (KT), wherein a ``translation'' model is trained to receive the parameters of a larger model and generate compressed parameters. The concept of KT draws inspiration from language translation, which effectively employs neural networks to convert different languages, maintaining identical meaning. Accordingly, we explore the potential of neural networks to convert models of disparate sizes, while preserving their functionality. We propose a comprehensive framework for KT, introduce data augmentation strategies to enhance model performance despite restricted training data, and successfully demonstrate the feasibility of KT on the MNIST dataset. Code is available at \url{https://github.com/zju-SWJ/KT}.

Knowledge Translation: A New Pathway for Model Compression

TL;DR

Knowledge Translation proposes translating large-block parameters into small-block parameters without retraining, enabling flexible, architecture-agnostic compression. The approach builds translation datasets via data generation, enhances them with random masking and noise augmentation, and trains a translator model (favoring MLP-Mixer) to minimize a translation loss between predicted and target parameters. In MNIST experiments, KT substantially outperforms baselines like random initialization or greedy replacement, improves with longer training and augmentation, and shows potential across architectures and datasets. This framework advances Green AI by reducing resource overhead while preserving functionality, and outlines concrete directions for scalable, dataset-efficient translation architectures and broader applications.

Abstract

Deep learning has witnessed significant advancements in recent years at the cost of increasing training, inference, and model storage overhead. While existing model compression methods strive to reduce the number of model parameters while maintaining high accuracy, they inevitably necessitate the re-training of the compressed model or impose architectural constraints. To overcome these limitations, this paper presents a novel framework, termed \textbf{K}nowledge \textbf{T}ranslation (KT), wherein a ``translation'' model is trained to receive the parameters of a larger model and generate compressed parameters. The concept of KT draws inspiration from language translation, which effectively employs neural networks to convert different languages, maintaining identical meaning. Accordingly, we explore the potential of neural networks to convert models of disparate sizes, while preserving their functionality. We propose a comprehensive framework for KT, introduce data augmentation strategies to enhance model performance despite restricted training data, and successfully demonstrate the feasibility of KT on the MNIST dataset. Code is available at \url{https://github.com/zju-SWJ/KT}.
Paper Structure (40 sections, 1 equation, 8 figures, 6 tables)

This paper contains 40 sections, 1 equation, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Schematic comparison of language and knowledge translation.
  • Figure 2: Overview of knowledge translation. Better viewed in color.
  • Figure 3: Example block architectures for knowledge translation.
  • Figure 4: Fitting ability evaluation for different architectures.
  • Figure 5: Visualization of features obtained using various methods. Base model trained with 300 epochs is used for knowledge translation. Better viewed in color.
  • ...and 3 more figures