Table of Contents
Fetching ...

OnlineTAS: An Online Baseline for Temporal Action Segmentation

Qing Zhong, Guodong Ding, Angela Yao

TL;DR

An adaptive memory designed to accommodate dynamic changes in context over time is presented, alongside a feature augmentation module that enhances the frames with the memory that achieves state-of-the-art performance.

Abstract

Temporal context plays a significant role in temporal action segmentation. In an offline setting, the context is typically captured by the segmentation network after observing the entire sequence. However, capturing and using such context information in an online setting remains an under-explored problem. This work presents the an online framework for temporal action segmentation. At the core of the framework is an adaptive memory designed to accommodate dynamic changes in context over time, alongside a feature augmentation module that enhances the frames with the memory. In addition, we propose a post-processing approach to mitigate the severe over-segmentation in the online setting. On three common segmentation benchmarks, our approach achieves state-of-the-art performance.

OnlineTAS: An Online Baseline for Temporal Action Segmentation

TL;DR

An adaptive memory designed to accommodate dynamic changes in context over time is presented, alongside a feature augmentation module that enhances the frames with the memory that achieves state-of-the-art performance.

Abstract

Temporal context plays a significant role in temporal action segmentation. In an offline setting, the context is typically captured by the segmentation network after observing the entire sequence. However, capturing and using such context information in an online setting remains an under-explored problem. This work presents the an online framework for temporal action segmentation. At the core of the framework is an adaptive memory designed to accommodate dynamic changes in context over time, alongside a feature augmentation module that enhances the frames with the memory. In addition, we propose a post-processing approach to mitigate the severe over-segmentation in the online setting. On three common segmentation benchmarks, our approach achieves state-of-the-art performance.

Paper Structure

This paper contains 18 sections, 15 equations, 6 figures, 12 tables, 2 algorithms.

Figures (6)

  • Figure 1: Context-aware Feature Augmentation (CFA) module. CFA takes as input a video clip $c_{k}$ of length $w$, augments it with temporal information captured in an adaptive memory bank $M_k$, and outputs an enhanced clip feature $\tilde{c}_{k}$. $I$ is the number of iterations of SA, TransDecoder, and CA.
  • Figure 2: Two inference types. a) Online inference samples clips with stride $\delta=1$ and only preserves the last frame prediction, while b) Semi-online inference samples non-overlapping clips with stride $\delta=w$ and all predictions are preserved.
  • Figure 3: Visualization of segmentation outputs for sequence "rgb-01-1" from 50Salads 50salads.
  • Figure 4: Standard vs. Causal Convolution
  • Figure : Adaptive Memory Update
  • ...and 1 more figures