Table of Contents
Fetching ...

SPAR: Self-supervised Placement-Aware Representation Learning for Distributed Sensing

Yizhuo Chen, Tianchen Wang, You Lyu, Yanlan Hu, Jinyang Li, Tomoyoshi Kimura, Hongjue Zhao, Yigong Hu, Denizhan Kara, Tarek Abdelzaher

TL;DR

SPAR addresses the placement-sensitivity gap in self-supervised learning for distributed sensing by coupling explicit spatial and structural embeddings with dual reconstruction objectives. Through a unified transformer-based architecture, it learns context-aware representations that reflect the duality between observer placements and observations. The work provides information-theoretic and occlusion-invariant analyses to justify the design and demonstrates superior generalization across vehicle, HAR, and seismic localization tasks, including unseen layouts and constrained communications. The results indicate SPAR's potential to enable robust, data-efficient sensing in real-world, multi-modal, multi-node deployments.

Abstract

We present SPAR, a framework for self-supervised placement-aware representation learning in distributed sensing. Distributed sensing spans applications where multiple spatially distributed and multimodal sensors jointly observe an environment, from vehicle monitoring to human activity recognition and earthquake localization. A central challenge shared by this wide spectrum of applications is that observed signals are inseparably shaped by sensor placements, including their spatial locations and structural characteristics. However, existing pretraining methods remain largely placement-agnostic. SPAR addresses this gap through a unifying principle: the duality between signals and positions. Guided by this principle, SPAR introduces spatial and structural positional embeddings together with dual reconstruction objectives, explicitly modeling how observing positions and observed signals shape each other. Placement is thus treated not as auxiliary metadata but as intrinsic to representation learning. SPAR is theoretically supported by analyses from information theory and occlusion-invariant learning. Extensive experiments on three real-world datasets show that SPAR achieves superior robustness and generalization across various modalities, placements, and downstream tasks.

SPAR: Self-supervised Placement-Aware Representation Learning for Distributed Sensing

TL;DR

SPAR addresses the placement-sensitivity gap in self-supervised learning for distributed sensing by coupling explicit spatial and structural embeddings with dual reconstruction objectives. Through a unified transformer-based architecture, it learns context-aware representations that reflect the duality between observer placements and observations. The work provides information-theoretic and occlusion-invariant analyses to justify the design and demonstrates superior generalization across vehicle, HAR, and seismic localization tasks, including unseen layouts and constrained communications. The results indicate SPAR's potential to enable robust, data-efficient sensing in real-world, multi-modal, multi-node deployments.

Abstract

We present SPAR, a framework for self-supervised placement-aware representation learning in distributed sensing. Distributed sensing spans applications where multiple spatially distributed and multimodal sensors jointly observe an environment, from vehicle monitoring to human activity recognition and earthquake localization. A central challenge shared by this wide spectrum of applications is that observed signals are inseparably shaped by sensor placements, including their spatial locations and structural characteristics. However, existing pretraining methods remain largely placement-agnostic. SPAR addresses this gap through a unifying principle: the duality between signals and positions. Guided by this principle, SPAR introduces spatial and structural positional embeddings together with dual reconstruction objectives, explicitly modeling how observing positions and observed signals shape each other. Placement is thus treated not as auxiliary metadata but as intrinsic to representation learning. SPAR is theoretically supported by analyses from information theory and occlusion-invariant learning. Extensive experiments on three real-world datasets show that SPAR achieves superior robustness and generalization across various modalities, placements, and downstream tasks.

Paper Structure

This paper contains 23 sections, 2 theorems, 35 equations, 7 figures, 8 tables.

Key Result

Proposition 3.1

Let $\mathsf{X}^{(k)}, \widetilde{\mathsf{Z}}^{(k)}, \mathsf{S}^{(k)}, \mathsf{R}^{(k)}$ denote the random variables corresponding to the signals, the post-fusion latent embeddings, the spatial positions, and the structural positions, for $k \in \{1, \dots, K\}$, respectively. Let $\mathbb{E}[L']$ where $I(\cdot;\cdot)$ denotes mutual information. In contrast, for SPAR, we can have where $I(\cd

Figures (7)

  • Figure 1: An overview of the SPAR workflow applied to a multi-modal multi-node distributed sensing application. Each node from each modality collects its own signal and is associated with a spatial position, as well as unique characteristics that influence its signal patterns. During pretraining, SPAR encodes information from all these aspects to generate latent embeddings, optimized via self-supervised objectives on unlabeled data. In the fine-tuning stage, the encoder is frozen and used to extract representations, which are then fed into task-specific heads trained with labeled data for downstream tasks.
  • Figure 2: Architectural overview of SPAR. Each node is assigned a continuous learnable structural position to capture its unique characteristics. The signals, spatial positions, and structural positions of all nodes are projected into a shared embedding space, combined, and encoded into latent embeddings. The latent embeddings are optimized with dual reconstruction objectives, encouraging the model to effectively utilize and retain both signal and positional information in a self-supervised and context-aware manner.
  • Figure 3: Ablation studies on spatial information preservation and robustness
  • Figure 4: Visualization of localization results. Blue dots denote ground truth locations (vehicle or earthquake epicenter), red dots are predictions by SPAR, and yellow triangles represent the spatial positions of deployed sensor nodes/stations. SPAR produces predictions that closely align with ground truth locations, demonstrating its robust spatial reasoning capability.
  • Figure 5: Confusion matrix of SPAR for the M3N-VC single-vehicle classification task (left) and the RealWorld-HAR activity recognition task (right). The classes are mostly separated by SPAR, and the confusion patterns generally align with the conceptual closeness of the classes.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Proposition 3.1
  • Proposition 3.2
  • proof
  • proof