Radio Frequency Signal based Human Silhouette Segmentation: A Sequential Diffusion Approach

Penghui Wen; Kun Hu; Dong Yuan; Zhiyuan Ning; Changyang Li; Zhiyong Wang

Radio Frequency Signal based Human Silhouette Segmentation: A Sequential Diffusion Approach

Penghui Wen, Kun Hu, Dong Yuan, Zhiyuan Ning, Changyang Li, Zhiyong Wang

TL;DR

RF-based human silhouette segmentation is advanced via a two-stage sequential diffusion model (SDM) that progressively generates silhouette maps from orthogonal RF heatmaps. Key innovations include cross-view transformation blocks (CTB) for multi-scale conditioning and spatio-temporal blocks (STB) for sequence-level motion coherence, enabling frame-level detail and temporal consistency. On the HIBER benchmark, SDM achieves state-of-the-art performance with an IoU of $0.732$, demonstrating improvements over one-shot baselines as sequence length increases. This diffusion-based approach enhances privacy-preserving RF-based HSS by integrating motion dynamics and cross-view cues, offering a practical framework for robust segmentation in challenging environments.

Abstract

Radio frequency (RF) signals have been proved to be flexible for human silhouette segmentation (HSS) under complex environments. Existing studies are mainly based on a one-shot approach, which lacks a coherent projection ability from the RF domain. Additionally, the spatio-temporal patterns have not been fully explored for human motion dynamics in HSS. Therefore, we propose a two-stage Sequential Diffusion Model (SDM) to progressively synthesize high-quality segmentation jointly with the considerations on motion dynamics. Cross-view transformation blocks are devised to guide the diffusion model in a multi-scale manner for comprehensively characterizing human related patterns in an individual frame such as directional projection from signal planes. Moreover, spatio-temporal blocks are devised to fine-tune the frame-level model to incorporate spatio-temporal contexts and motion dynamics, enhancing the consistency of the segmentation maps. Comprehensive experiments on a public benchmark -- HIBER demonstrate the state-of-the-art performance of our method with an IoU 0.732. Our code is available at https://github.com/ph-w2000/SDM.

Radio Frequency Signal based Human Silhouette Segmentation: A Sequential Diffusion Approach

TL;DR

, demonstrating improvements over one-shot baselines as sequence length increases. This diffusion-based approach enhances privacy-preserving RF-based HSS by integrating motion dynamics and cross-view cues, offering a practical framework for robust segmentation in challenging environments.

Abstract

Paper Structure (18 sections, 7 equations, 5 figures, 2 tables)

This paper contains 18 sections, 7 equations, 5 figures, 2 tables.

Introduction
RELATED WORK
Optical Cameras based HSS
Wireless Sensors based HSS
Diffusion Methods
METHODOLOGY
Preliminary
Frame-Level Silhouette Diffusion
Ortho-Cross Heatmap Encoder
Discrete Segmentation Embedding
Silhouette Diffusion Network
Frame-Level Optimization
Sequence-Level Silhouette Fine-Tuning
Experiments
Dataset & Implementation Details
...and 3 more sections

Figures (5)

Figure 1: Human silhouette segmentation based on RF signals. Processed a) horizontal and b) vertical heatmaps converge to generate the c) silhouette map; and d) a 3D space represents an overall procedure of the RF-based segmentation task.
Figure 2: Illustration of the proposed architecture: sequential diffusion model (SDM). SDM is a UNet-based network composed of a ortho-cross encoder $H_{\text{OCH}}$ and a silhouette diffusion network $H_{\text{SDN}}$ with multiple cross-view transformation blocks (CTBs) and spatio-temporal blocks (STBs). The ortho-cross encoder $H_{\text{OCH}}$ encodes human motion patterns from the given paired horizontal heatmap and vertical heatmaps with residual blocks. To transfer rich multi-scale features extracted from $H_{\text{OCH}}$ to the $H_{\text{SDN}}$ , we propose to use cross-attention based cross-view transformation blocks (CTBs) that are embedded in different layers of $H_{\text{SDN}}$. During the Sequence-Level Silhouette Fine-Tuning stage, we fix the weights of the components in the frame-level stage and insert an spatio-temporal block (STB) after every CTB block for fine-tuning, to formulate thespatio-temporal cohension.
Figure 3: Qualitative comparison with RFMask.
Figure 4: Ablation studies on CTB. The first example is from the WALK set, while the second is from MULTI set.
Figure 5: Ablation studies on STB. The first example is from the WALK set, while the second is from MULTI set.

Radio Frequency Signal based Human Silhouette Segmentation: A Sequential Diffusion Approach

TL;DR

Abstract

Radio Frequency Signal based Human Silhouette Segmentation: A Sequential Diffusion Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (5)