Table of Contents
Fetching ...

Spectral Scalpel: Amplifying Adjacent Action Discrepancy via Frequency-Selective Filtering for Skeleton-Based Action Segmentation

Haoyu Ji, Bowen Chen, Zhihao Yang, Wenze Huang, Yu Gao, Xueting Liu, Weihong Ren, Zhiyong Wang, Honghai Liu

Abstract

Skeleton-based Temporal Action Segmentation (STAS) seeks to densely segment and classify diverse actions within long, untrimmed skeletal motion sequences. However, existing STAS methodologies face challenges of limited inter-class discriminability and blurred segmentation boundaries, primarily due to insufficient distinction of spatio-temporal patterns between adjacent actions. To address these limitations, we propose Spectral Scalpel, a frequency-selective filtering framework aimed at suppressing shared frequency components between adjacent distinct actions while amplifying their action-specific frequencies, thereby enhancing inter-action discrepancies and sharpening transition boundaries. Specifically, Spectral Scalpel employs adaptive multi-scale spectral filters as scalpels to edit frequency spectra, coupled with a discrepancy loss between adjacent actions serving as the surgical objective. This design amplifies representational disparities between neighboring actions, effectively mitigating boundary localization ambiguities and inter-class confusion. Furthermore, complementing long-term temporal modeling, we introduce a frequency-aware channel mixer to strengthen channel evolution by aggregating spectra across channels. This work presents a novel paradigm for STAS that extends conventional spatio-temporal modeling by incorporating frequency-domain analysis. Extensive experiments on five public datasets demonstrate that Spectral Scalpel achieves state-of-the-art performance. Code is available at https://github.com/HaoyuJi/SpecScalpel.

Spectral Scalpel: Amplifying Adjacent Action Discrepancy via Frequency-Selective Filtering for Skeleton-Based Action Segmentation

Abstract

Skeleton-based Temporal Action Segmentation (STAS) seeks to densely segment and classify diverse actions within long, untrimmed skeletal motion sequences. However, existing STAS methodologies face challenges of limited inter-class discriminability and blurred segmentation boundaries, primarily due to insufficient distinction of spatio-temporal patterns between adjacent actions. To address these limitations, we propose Spectral Scalpel, a frequency-selective filtering framework aimed at suppressing shared frequency components between adjacent distinct actions while amplifying their action-specific frequencies, thereby enhancing inter-action discrepancies and sharpening transition boundaries. Specifically, Spectral Scalpel employs adaptive multi-scale spectral filters as scalpels to edit frequency spectra, coupled with a discrepancy loss between adjacent actions serving as the surgical objective. This design amplifies representational disparities between neighboring actions, effectively mitigating boundary localization ambiguities and inter-class confusion. Furthermore, complementing long-term temporal modeling, we introduce a frequency-aware channel mixer to strengthen channel evolution by aggregating spectra across channels. This work presents a novel paradigm for STAS that extends conventional spatio-temporal modeling by incorporating frequency-domain analysis. Extensive experiments on five public datasets demonstrate that Spectral Scalpel achieves state-of-the-art performance. Code is available at https://github.com/HaoyuJi/SpecScalpel.
Paper Structure (39 sections, 24 equations, 8 figures, 31 tables)

This paper contains 39 sections, 24 equations, 8 figures, 31 tables.

Figures (8)

  • Figure 1: Effect of a designed frequency filter in amplifying adjacent action discrepancy. The synthetic single-joint motion sequence consists of two concatenated action segments, each idealized as a sum of cosine waves with both shared and action-specific frequency components. After FFT, a designed filter amplifies action-specific frequencies and suppresses shared ones. The filtered sequence shows enhanced action discriminability and a clearer boundary.
  • Figure 2: Overview of Spectral Scalpel. After spatial modeling, joint features are transformed into the frequency domain. A discrepancy-guided filter, optimized by adjacent action discrepancy loss, enhances action-specific spectra and suppresses shared ones, improving discriminability and boundary clarity.
  • Figure 3: Overview of the Spectral Scalpel framework. It comprises four sequential stages: (1) Spatial Modeling with multi-scale, dual-branch dynamic GCNs; (2) Frequency Modeling via the Multi-scale Adaptive Spectral Filter (MASF); (3) Temporal Modeling through Linear Transformers, Frequency-Aware Channel Mixers (FACM), and adaptive fusion; (4) Prediction Refinement, which generates and refines class and boundary predictions. The model is optimized by supervised losses computed from filtered features, final representations, classification predictions, and boundary predictions.
  • Figure 4: Illustration of the three core components. (a) Multi-scale Adaptive Spectral Filter (MASF) enhances frequency discriminability using multi-head filters at multiple scales, combined with dual-branch dynamic-static channel-wise fusion. (b) Adjacent Action Discrepancy Loss (AADL) guides frequency dynamics by maximizing amplitude spectrum discrepancies between adjacent segments. (c) Frequency-Aware Channel Mixer (FACM) models frequency-domain channel interactions via real-imaginary decomposition and shared point-wise convolutions.
  • Figure 5: Qualitative results of action segmentation on the PKU-MMD v2 (X-sub, X-view), LARa and MCFS-130 datasets. Different colors represent distinct action classes. Red boxes highlight segmentation errors in other methods compared to Spectral Scalpel.
  • ...and 3 more figures