SFGANS Self-supervised Future Generator for human ActioN Segmentation

Or Berman; Adam Goldbraikh; Shlomi Laufer

SFGANS Self-supervised Future Generator for human ActioN Segmentation

Or Berman, Adam Goldbraikh, Shlomi Laufer

TL;DR

The paper tackles long untrimmed video action segmentation by inserting a self-supervised future-feature generator (SFGANS) midway in the standard pipeline to refine feature representations before segmentation. The generator uses a retrospective cycle-GAN framework to predict short-horizon future feature vectors from past features, trained with a cycle-consistent adversarial objective and a sequence-prediction loss, and it outputs refined features for downstream models. Across temporal, online, and timestamp-supervised segmentation tasks and multiple datasets, employing the predicted future features yields consistent improvements over baselines without hyperparameter tuning, with additional gains achievable through hyperparameter optimization or all-data self-supervised training. The approach demonstrates practical benefits for diverse backbones (e.g., MS-TCN++, ASFormer, DTGRM) and datasets, suggesting a robust, generalizable boost to action-segmentation performance with minimal labeling overhead and modest computational overhead for real-time settings.

Abstract

The ability to locate and classify action segments in long untrimmed video is of particular interest to many applications such as autonomous cars, robotics and healthcare applications. Today, the most popular pipeline for action segmentation is composed of encoding the frames into feature vectors, which are then processed by a temporal model for segmentation. In this paper we present a self-supervised method that comes in the middle of the standard pipeline and generated refined representations of the original feature vectors. Experiments show that this method improves the performance of existing models on different sub-tasks of action segmentation, even without additional hyper parameter tuning.

SFGANS Self-supervised Future Generator for human ActioN Segmentation

TL;DR

Abstract

Paper Structure (25 sections, 8 equations, 2 figures, 8 tables)

This paper contains 25 sections, 8 equations, 2 figures, 8 tables.

Introduction
Related Work
Method
Future Prediction
Model
Training Procedure
Objective Function
Metrics
Tasks, Models and Datasets
Temporal Action Segmentation
Models
Timestamp Supervision Temporal Action Segmentation
Model
Online Action Segmentation
Model
...and 10 more sections

Figures (2)

Figure 1: Full paper pipeline. (1) Frames are encoded into feature vectors using a feature extractor. (2) A prediction of the (n+i)-th vector is generated using a sequence of feature vectors. In this paper we implemented for i values of 1, 4, and 10. (3) Replacing the n-th feature vector with prediction n+i. Phases 2 and 3 are repeated for each vector. (4) The new predicted features replaces the original ones, and sent to the segmentation model.
Figure 2: The generator architecture. It is strongly based on the original generator architecture from kwon2019predicting with additions marked in red. In the figure, I-BN is instance batch norm, and k, n, and s denote the kernel size, channels, and stride respectively. A More detailed description of the architecture, including the architectures of the residual blocks and the discriminators' are found in kwon2019predicting. The notations are similar for convenience.

SFGANS Self-supervised Future Generator for human ActioN Segmentation

TL;DR

Abstract

SFGANS Self-supervised Future Generator for human ActioN Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)