Table of Contents
Fetching ...

XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge

Yu Zhang, Xi Zhang, Hualin Zhou, Xinyuan Chen, Shang Gao, Hong Jia, Jianfei Yang, Yuankai Qi, Tao Gu

TL;DR

This work addresses cross-modality, few-shot model transfer for human sensing on edge devices, where data scarcity and resource constraints hinder deployment. It introduces XTransfer, a modality-agnostic framework combining a Splice–Repair–Removal SRR pipeline with a Layer-Wise Search (LWS) mechanism to repair and restructure pre-trained models using only a few sensor samples. The core ideas include anchor-based latent space alignment in a reduced PCA space, an anchor-based repair loss to minimize layer-wise distribution shifts, and an efficient, NAS-inspired layer recombining strategy under resource budgets. Experimental results demonstrate state-of-the-art accuracy and substantial reductions in data needs, training time, and edge-deployment costs across multiple sensing modalities and datasets. Overall, XTransfer offers a scalable, practical path to reuse public pre-trained models for diverse edge sensing tasks with limited labeled data.

Abstract

Deep learning for human sensing on edge systems presents significant potential for smart applications. However, its training and development are hindered by the limited availability of sensor data and resource constraints of edge systems. While transferring pre-trained models to different sensing applications is promising, existing methods often require extensive sensor data and computational resources, resulting in high costs and limited transferability. In this paper, we propose XTransfer, a first-of-its-kind method enabling modality-agnostic, few-shot model transfer with resource-efficient design. XTransfer flexibly uses pre-trained models and transfers knowledge across different modalities by (i) model repairing that safely mitigates modality shift by adapting pre-trained layers with only few sensor data, and (ii) layer recombining that efficiently searches and recombines layers of interest from source models in a layer-wise manner to restructure models. We benchmark various baselines across diverse human sensing datasets spanning different modalities. The results show that XTransfer achieves state-of-the-art performance while significantly reducing the costs of sensor data collection, model training, and edge deployment.

XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge

TL;DR

This work addresses cross-modality, few-shot model transfer for human sensing on edge devices, where data scarcity and resource constraints hinder deployment. It introduces XTransfer, a modality-agnostic framework combining a Splice–Repair–Removal SRR pipeline with a Layer-Wise Search (LWS) mechanism to repair and restructure pre-trained models using only a few sensor samples. The core ideas include anchor-based latent space alignment in a reduced PCA space, an anchor-based repair loss to minimize layer-wise distribution shifts, and an efficient, NAS-inspired layer recombining strategy under resource budgets. Experimental results demonstrate state-of-the-art accuracy and substantial reductions in data needs, training time, and edge-deployment costs across multiple sensing modalities and datasets. Overall, XTransfer offers a scalable, practical path to reuse public pre-trained models for diverse edge sensing tasks with limited labeled data.

Abstract

Deep learning for human sensing on edge systems presents significant potential for smart applications. However, its training and development are hindered by the limited availability of sensor data and resource constraints of edge systems. While transferring pre-trained models to different sensing applications is promising, existing methods often require extensive sensor data and computational resources, resulting in high costs and limited transferability. In this paper, we propose XTransfer, a first-of-its-kind method enabling modality-agnostic, few-shot model transfer with resource-efficient design. XTransfer flexibly uses pre-trained models and transfers knowledge across different modalities by (i) model repairing that safely mitigates modality shift by adapting pre-trained layers with only few sensor data, and (ii) layer recombining that efficiently searches and recombines layers of interest from source models in a layer-wise manner to restructure models. We benchmark various baselines across diverse human sensing datasets spanning different modalities. The results show that XTransfer achieves state-of-the-art performance while significantly reducing the costs of sensor data collection, model training, and edge deployment.

Paper Structure

This paper contains 42 sections, 2 equations, 7 figures, 15 tables.

Figures (7)

  • Figure 1: Preliminary study under cross-modality FSL settings . (a) reveals baseline performance gap. (b) shows average similarity and FSL difficulty across all target sensing datasets in Table \ref{['tab:data']} to source modalities (e.g., Image, Text, Sensing) using default reshaping (Appendix \ref{['app:shape']}) and benchmarking Jaehoon2022 (Appendix \ref{['app:fslsetup']} for details). Two distinct areas represent similarity levels (A--hard, B--normal). Key findings: 1) compared to CUB, similarity levels across modalities are notably low, e.g., Text and Sensing fall into Area A, indicating a significant modality shift; 2) compared to Image-Unpair (i.e., no class pairing) in Area A, Image surprisingly falls into Area B, indicating that pairing classes may enhance cross-modality similarity; 3) Image exhibits more stable standard deviations and lower FSL difficulty, suggesting better potential for model transfer. (c) shows significant model accuracy loss using pruning.
  • Figure 2: Design insights. (a) Layer-wise accuracy convergence using baselines is disrupted due to modality shift. (b) In area A, accuracy rises as MMC shift stays low, indicating a small latent feature gap. In area B, MMC shift notably increases with layer index, where excessive latent feature deviation begins to reduce accuracy. (c) After repairing, S-score improves but stagnation occurs at certain layers.
  • Figure 3: Overview. XTransfer transfers source models across modalities with few sensor data through model repairing (SRR pipeline) and layer recombining (LWS control). LWS control first segments source models into layers and operates layer-wise search across pools. At each pool, the pre-search check decides which layers need repairing, then SRR pipeline repairs them and LWS control selects layers of interest. These layers are incrementally recombined during the search, restructuring models for enabling human sensing at the edge. Subfigures (a)–(c) illustrate the feature space evolution before and after repairing.
  • Figure 4: Ablation study evaluating the performance of model repairing and layer recombining.
  • Figure 5: (a) Embedded mmWave radar testbed setup; (b)-(e) Built human sensing applications across different real-world settings.
  • ...and 2 more figures