Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment

Zhanzhong Pang; Fadime Sener; Shrinivas Ramasubramanian; Angela Yao

Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment

Zhanzhong Pang, Fadime Sener, Shrinivas Ramasubramanian, Angela Yao

TL;DR

This work addresses the long-tail problem in temporal action segmentation of procedural videos by introducing Group-wise Temporal Logit Adjustment (G-TLA). G-TLA combines activity-conditioned group-wise classification with a temporally-aware logit adjustment that leverages action order priors, reducing tail-class false positives while preserving head-class performance. The method introduces a two-stage GTLA loss and an inference procedure that selects the appropriate action group, yielding improved frame-level and segment-level metrics across multiple datasets and backbones, with ablations confirming the contributions of group-wise classification and temporal priors. The results demonstrate stronger tail-action recognition and better balanced performance, suggesting practical impact for robust understanding of complex procedural videos in real-world settings.

Abstract

Procedural activity videos often exhibit a long-tailed action distribution due to varying action frequencies and durations. However, state-of-the-art temporal action segmentation methods overlook the long tail and fail to recognize tail actions. Existing long-tail methods make class-independent assumptions and struggle to identify tail classes when applied to temporal segmentation frameworks. This work proposes a novel group-wise temporal logit adjustment~(G-TLA) framework that combines a group-wise softmax formulation while leveraging activity information and action ordering for logit adjustment. The proposed framework significantly improves in segmenting tail actions without any performance loss on head actions.

Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment

TL;DR

Abstract

Paper Structure (26 sections, 15 equations, 12 figures, 27 tables, 1 algorithm)

This paper contains 26 sections, 15 equations, 12 figures, 27 tables, 1 algorithm.

Introduction
Related works
Preliminaries
Temporal Action Segmentation
Logit Adjustment
Methodology
Action Inter-Dependencies
Group-wise Classification
Temporal Logit Adjustment
Overall loss
Inference
Experiments
Dataset, Implementation, and Evaluation
Benchmark Comparisons
Ablation Studies
...and 11 more sections

Figures (12)

Figure 1: "Making tea", with temporal segments indicated by colored bars. The tail action 'stir tea' is recognized by Logit adjustment (LA) and our G-TLA but not by the MSTCN backbone. However, LA overlooks the action order and activity, resulting in activity-irrelevant false positives such as 'take bowl' & 'stir coffee', and temporally illogical false positives like 'add teabag' occurring after 'stir tea'.
Figure 2: Temporal action segmentation datasets exhibit a long-tail distribution of actions due to varying frequencies of actions and action durations.
Figure 3: Our group-wise temporal logit adjustment framework consists of group-wise classification and temporal logit adjustment within the respective group. The temporal logit adjustment is only applied to the target group($G_1$ in this illustration).
Figure 4: Illustration of temporal logit adjustment for class $c=$'add teabag'. The adjustment only occurs within the temporal bounds.
Figure 5: Radar charts of different logit adjustment methods, measuring the performance along balanced and global metrics on Breakfast with MSTCN and AsFormer.
...and 7 more figures

Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment

TL;DR

Abstract

Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment

Authors

TL;DR

Abstract

Table of Contents

Figures (12)