Table of Contents
Fetching ...

Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition

Zixing Zhang, Zhongren Dong, Weixiang Xu, Jing Han

TL;DR

Experimental results show that the proposed Transformer Re-parameterization approach consistently improves the performance of lightweight Transformers, even making them comparable to large models, when deployed on resource-constrained IoT devices.

Abstract

With the increasing implementation of machine learning models on edge or Internet-of-Things (IoT) devices, deploying advanced models on resource-constrained IoT devices remains challenging. Transformer models, a currently dominant neural architecture, have achieved great success in broad domains but their complexity hinders its deployment on IoT devices with limited computation capability and storage size. Although many model compression approaches have been explored, they often suffer from notorious performance degradation. To address this issue, we introduce a new method, namely Transformer Re-parameterization, to boost the performance of lightweight Transformer models. It consists of two processes: the High-Rank Factorization (HRF) process in the training stage and the deHigh-Rank Factorization (deHRF) process in the inference stage. In the former process, we insert an additional linear layer before the Feed-Forward Network (FFN) of the lightweight Transformer. It is supposed that the inserted HRF layers can enhance the model learning capability. In the later process, the auxiliary HRF layer will be merged together with the following FFN layer into one linear layer and thus recover the original structure of the lightweight model. To examine the effectiveness of the proposed method, we evaluate it on three widely used Transformer variants, i.e., ConvTransformer, Conformer, and SpeechFormer networks, in the application of speech emotion recognition on the IEMOCAP, M3ED and DAIC-WOZ datasets. Experimental results show that our proposed method consistently improves the performance of lightweight Transformers, even making them comparable to large models. The proposed re-parameterization approach enables advanced Transformer models to be deployed on resource-constrained IoT devices.

Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition

TL;DR

Experimental results show that the proposed Transformer Re-parameterization approach consistently improves the performance of lightweight Transformers, even making them comparable to large models, when deployed on resource-constrained IoT devices.

Abstract

With the increasing implementation of machine learning models on edge or Internet-of-Things (IoT) devices, deploying advanced models on resource-constrained IoT devices remains challenging. Transformer models, a currently dominant neural architecture, have achieved great success in broad domains but their complexity hinders its deployment on IoT devices with limited computation capability and storage size. Although many model compression approaches have been explored, they often suffer from notorious performance degradation. To address this issue, we introduce a new method, namely Transformer Re-parameterization, to boost the performance of lightweight Transformer models. It consists of two processes: the High-Rank Factorization (HRF) process in the training stage and the deHigh-Rank Factorization (deHRF) process in the inference stage. In the former process, we insert an additional linear layer before the Feed-Forward Network (FFN) of the lightweight Transformer. It is supposed that the inserted HRF layers can enhance the model learning capability. In the later process, the auxiliary HRF layer will be merged together with the following FFN layer into one linear layer and thus recover the original structure of the lightweight model. To examine the effectiveness of the proposed method, we evaluate it on three widely used Transformer variants, i.e., ConvTransformer, Conformer, and SpeechFormer networks, in the application of speech emotion recognition on the IEMOCAP, M3ED and DAIC-WOZ datasets. Experimental results show that our proposed method consistently improves the performance of lightweight Transformers, even making them comparable to large models. The proposed re-parameterization approach enables advanced Transformer models to be deployed on resource-constrained IoT devices.

Paper Structure

This paper contains 18 sections, 6 equations, 12 figures, 8 tables, 2 algorithms.

Figures (12)

  • Figure 1: Four categories of model compression methods. (a) Pruning: Removing some weights or connections from the network. (b) Knowledge Distillation: Transfering knowledge from a large and accurate teacher model to a smaller student model. (c) Matrix Decomposition: Decomposing a large weight matrix into several smaller matrices. (d) Structural Re-Parameterization: Shrinking a multi-branch structure into a single-branch structure, typically employed in convolutional neural networks.
  • Figure 2: Framework of Transformer re-parameterisation in different modules with the examples of ConvTransformer, SpeechFormer, and Conformer.
  • Figure 3: Detailed illustration of the re-parameterization process in the Transformer Feed-Forward Network (FFN).
  • Figure 4: Results of WF1 when applying different expansion ratios of HRF to different modules of ConvTransformer (a), Conformer (b), or SpeechFromer (c) on the IEMOCAP dataset.
  • Figure 5: Results of WF1 when applying different expansion ratios of HRF to different modules of ConvTransformer (a), Conformer (b), or SpeechFromer (c) on the M$^{3}$ED dataset.
  • ...and 7 more figures