Table of Contents
Fetching ...

A Baseline Study and Benchmark for Few-Shot Open-Set Action Recognition with Feature Residual Discrimination

Stefano Berti, Giulia Pasquale, Lorenzo Natale

TL;DR

This work proposes an architectural extension based on a Feature-Residual Discriminator (FR-Disc) that significantly enhances unknown rejection capabilities without compromising closed-set accuracy, setting a new state-of-the-art for FSOS-AR.

Abstract

Few-Shot Action Recognition (FS-AR) has shown promising results but is often limited by a closed-set assumption that fails in real-world open-set scenarios. While Few-Shot Open-Set (FSOS) recognition is well-established for images, its extension to spatio-temporal video data remains underexplored. To address this, we propose an architectural extension based on a Feature-Residual Discriminator (FR-Disc), adapting previous work on skeletal data to the more complex video domain. Extensive experiments on five datasets demonstrate that while common open-set techniques provide only marginal gains, our FR-Disc significantly enhances unknown rejection capabilities without compromising closed-set accuracy, setting a new state-of-the-art for FSOS-AR. The project website, code, and benchmark are available at: https://hsp-iit.github.io/fsosar/.

A Baseline Study and Benchmark for Few-Shot Open-Set Action Recognition with Feature Residual Discrimination

TL;DR

This work proposes an architectural extension based on a Feature-Residual Discriminator (FR-Disc) that significantly enhances unknown rejection capabilities without compromising closed-set accuracy, setting a new state-of-the-art for FSOS-AR.

Abstract

Few-Shot Action Recognition (FS-AR) has shown promising results but is often limited by a closed-set assumption that fails in real-world open-set scenarios. While Few-Shot Open-Set (FSOS) recognition is well-established for images, its extension to spatio-temporal video data remains underexplored. To address this, we propose an architectural extension based on a Feature-Residual Discriminator (FR-Disc), adapting previous work on skeletal data to the more complex video domain. Extensive experiments on five datasets demonstrate that while common open-set techniques provide only marginal gains, our FR-Disc significantly enhances unknown rejection capabilities without compromising closed-set accuracy, setting a new state-of-the-art for FSOS-AR. The project website, code, and benchmark are available at: https://hsp-iit.github.io/fsosar/.
Paper Structure (20 sections, 4 equations, 4 figures, 2 tables)

This paper contains 20 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Comparison between the results of the closed-set Softmax Baseline SAFSAR tang2024semantic and its proposed open-set extension FR-Disc on an unknown query. The Support Set contains $K=2$ classes and $N=1$ videos. The true class of the query is not contained in the Support Set, so it should be rejected through low confidence (e.g.,$< 50\%$). The Baseline incorrectly classifies it as known (false positive), while FR-Disc correctly rejects it (true negative). Additional qualitative examples are provided in the Supplementary Material.
  • Figure 2: Overview of closed- and open-set prediction flows for (a) implicit (Softmax, EOS), (b) explicit (Garbage Class), and (c) Discriminator-based methods. For implicit methods, $\hat{u}_i$ is derived via MLS/MSS. In (b), a query is rejected if the garbage prototype $P_g$ yields the highest probability. In (c), $\hat{u}_i$ is computed by an MLP with Sigmoid activation using the $\varTheta$ function (Eq. \ref{['eq:getminnorm']}). (d) and (e) illustrate the feature extraction and inference decision flow for SAFSAR and STRM models.
  • Figure 3: Analysis of open-set scoring and performance correlation. (a) Comparison between MSS and MLS for SAFSAR's Softmax Baseline on SSv2 1-Shot. (b) Global correlation between closed-set and open-set metrics across all five datasets for both SAFSAR and STRM.
  • Figure 4: Qualitative comparison between the SAFSAR Softmax baseline and FR-Disc on the SSv2 test set. (a) t-SNE of video features $\phi(x_i)$ for the baseline (left) and FR-Disc (right). (b) Score histograms for known (blue) and unknown (red) queries, comparing the baseline (top) and FR-Disc (bottom). Our method yields tighter feature clustering and assigns significantly lower confidence to unknown samples.