Table of Contents
Fetching ...

Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration

Wei Ji, Li Li, Zheqi Lv, Wenqiao Zhang, Mengze Li, Zhen Wan, Wenqiang Lei, Roger Zimmermann

TL;DR

A universal on-device Multi-modal Model Adaptation framework is introduced, revolutionizing on-device model adaptation by striking a balance between efficiency and effectiveness, and represents a pioneering solution for on-Device Multi-modal Model Adaptation (DMMA).

Abstract

In our increasingly interconnected world, where intelligent devices continually amass copious personalized multi-modal data, a pressing need arises to deliver high-quality, personalized device-aware services. However, this endeavor presents a multifaceted challenge to prevailing artificial intelligence (AI) systems primarily rooted in the cloud. As these systems grapple with shifting data distributions between the cloud and devices, the traditional approach of fine-tuning-based adaptation (FTA) exists the following issues: the costly and time-consuming data annotation required by FTA and the looming risk of model overfitting. To surmount these challenges, we introduce a Universal On-Device Multi-modal Model Adaptation Framework, revolutionizing on-device model adaptation by striking a balance between efficiency and effectiveness. The framework features the Fast Domain Adaptor (FDA) hosted in the cloud, providing tailored parameters for the Lightweight Multi-modal Model on devices. To enhance adaptability across multi-modal tasks, the AnchorFrame Distribution Reasoner (ADR) minimizes communication costs. Our contributions, encapsulated in the Cloud-Device Collaboration Multi-modal Parameter Generation (CDC-MMPG) framework, represent a pioneering solution for on-Device Multi-modal Model Adaptation (DMMA). Extensive experiments validate the efficiency and effectiveness of our method, particularly in video question answering and retrieval tasks, driving forward the integration of intelligent devices into our daily lives.

Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration

TL;DR

A universal on-device Multi-modal Model Adaptation framework is introduced, revolutionizing on-device model adaptation by striking a balance between efficiency and effectiveness, and represents a pioneering solution for on-Device Multi-modal Model Adaptation (DMMA).

Abstract

In our increasingly interconnected world, where intelligent devices continually amass copious personalized multi-modal data, a pressing need arises to deliver high-quality, personalized device-aware services. However, this endeavor presents a multifaceted challenge to prevailing artificial intelligence (AI) systems primarily rooted in the cloud. As these systems grapple with shifting data distributions between the cloud and devices, the traditional approach of fine-tuning-based adaptation (FTA) exists the following issues: the costly and time-consuming data annotation required by FTA and the looming risk of model overfitting. To surmount these challenges, we introduce a Universal On-Device Multi-modal Model Adaptation Framework, revolutionizing on-device model adaptation by striking a balance between efficiency and effectiveness. The framework features the Fast Domain Adaptor (FDA) hosted in the cloud, providing tailored parameters for the Lightweight Multi-modal Model on devices. To enhance adaptability across multi-modal tasks, the AnchorFrame Distribution Reasoner (ADR) minimizes communication costs. Our contributions, encapsulated in the Cloud-Device Collaboration Multi-modal Parameter Generation (CDC-MMPG) framework, represent a pioneering solution for on-Device Multi-modal Model Adaptation (DMMA). Extensive experiments validate the efficiency and effectiveness of our method, particularly in video question answering and retrieval tasks, driving forward the integration of intelligent devices into our daily lives.
Paper Structure (15 sections, 16 equations, 2 figures, 6 tables, 1 algorithm)

This paper contains 15 sections, 16 equations, 2 figures, 6 tables, 1 algorithm.

Figures (2)

  • Figure 1: (a) Multi-modal data on cloud and different devices exist in different distributions due to the personalized preference of users. (b) Compared with conventional methods of deploying models on different devices, we propose an FDA that can achieve a balance of efficiency and effectiveness.
  • Figure 2: Illustration of the overall pipeline of our method, CDC-MMPG. (a) and (b) represent the Cloud model, which reconstructs the video features uploaded from the device and reasons out the personal parameters of the device model based on the reconstructed video features. (c) represents the lightweight multi-modal device-side model, which extracts the multi-modal features, and uploads the video features to the cloud model for the personal device-model parameter prediction. After being updated with the personal parameters, the lightweight multi-modal device-side model will further analyze the multi-modal features and make the final prediction.