Table of Contents
Fetching ...

Streamlining Multimodal Data Fusion in Wireless Communication and Sensor Networks

Mohammud J. Bocus, Xiaoyang Wang, Robert. J. Piechocki

TL;DR

The paper tackles the challenge of efficient multimodal data fusion and compression in wireless sensing and communications. It proposes a multimodal Vector-Quantized Variational Autoencoder (VQVAE) that learns a shared discrete latent across diverse modalities (images, WiFi spectrograms, CSI data) and extends this framework to end-to-end CSI feedback in a 5G-like system, reducing uplink overhead while preserving channel estimate quality. Key findings show strong reconstruction on paired MNIST-SVHN and WiFi spectrogram data, discriminative latent spaces suitable for HAR, and superior CSI feedback performance compared with state-of-the-art baselines at comparable compression. The approach offers a simple yet effective pathway to bandwidth-efficient, edge-friendly multimodal perception and communication in next-generation networks.

Abstract

This paper presents a novel approach for multimodal data fusion based on the Vector-Quantized Variational Autoencoder (VQVAE) architecture. The proposed method is simple yet effective in achieving excellent reconstruction performance on paired MNIST-SVHN data and WiFi spectrogram data. Additionally, the multimodal VQVAE model is extended to the 5G communication scenario, where an end-to-end Channel State Information (CSI) feedback system is implemented to compress data transmitted between the base-station (eNodeB) and User Equipment (UE), without significant loss of performance. The proposed model learns a discriminative compressed feature space for various types of input data (CSI, spectrograms, natural images, etc), making it a suitable solution for applications with limited computational resources.

Streamlining Multimodal Data Fusion in Wireless Communication and Sensor Networks

TL;DR

The paper tackles the challenge of efficient multimodal data fusion and compression in wireless sensing and communications. It proposes a multimodal Vector-Quantized Variational Autoencoder (VQVAE) that learns a shared discrete latent across diverse modalities (images, WiFi spectrograms, CSI data) and extends this framework to end-to-end CSI feedback in a 5G-like system, reducing uplink overhead while preserving channel estimate quality. Key findings show strong reconstruction on paired MNIST-SVHN and WiFi spectrogram data, discriminative latent spaces suitable for HAR, and superior CSI feedback performance compared with state-of-the-art baselines at comparable compression. The approach offers a simple yet effective pathway to bandwidth-efficient, edge-friendly multimodal perception and communication in next-generation networks.

Abstract

This paper presents a novel approach for multimodal data fusion based on the Vector-Quantized Variational Autoencoder (VQVAE) architecture. The proposed method is simple yet effective in achieving excellent reconstruction performance on paired MNIST-SVHN data and WiFi spectrogram data. Additionally, the multimodal VQVAE model is extended to the 5G communication scenario, where an end-to-end Channel State Information (CSI) feedback system is implemented to compress data transmitted between the base-station (eNodeB) and User Equipment (UE), without significant loss of performance. The proposed model learns a discriminative compressed feature space for various types of input data (CSI, spectrograms, natural images, etc), making it a suitable solution for applications with limited computational resources.
Paper Structure (16 sections, 4 equations, 12 figures, 3 tables)

This paper contains 16 sections, 4 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Multimodal VQVAE model.
  • Figure 2: Illustration of communication between a gNodeB (base station) and User Equipment (UE) in a conventional 5G radio network.
  • Figure 3: CSI feedback pre-processing steps.
  • Figure 4: End-to-end CSI feedback multimodal VQVAE model.
  • Figure 5: Examples of paired MNIST and SVHN images reconstructed using multimodal VQVAE model: (a) $(k,d)=(512,128)$, mean reconstruction error across test data = 0.0033, (b) $(k,d)=(64,128)$, mean reconstruction error across test data = 0.0056.
  • ...and 7 more figures