Table of Contents
Fetching ...

H2ST: Hierarchical Two-Sample Tests for Continual Out-of-Distribution Detection

Yuhang Liu, Wenjie Zhao, Yunhui Guo

TL;DR

Open-world Task Incremental Learning (TIL) requires detecting out-of-distribution (OOD) samples and identifying their corresponding tasks without predefined thresholds. The authors propose Hierarchical Two-Sample Tests (H2ST), a threshold-free approach that tests distributions at the feature level via a cascade of task-specific two-sample tests, enabling both OOD detection and task-id prediction with low overhead. H2ST integrates with replay-based TIL memory, updates online, and uses CP-based calibrated detection with an early-exit mechanism to assign ID and task-id efficiently. Comprehensive experiments across MNIST, SVHN, CIFAR-10/100, Mini-ImageNet, CoRe50, and Stream-51 demonstrate strong OOD detection (high F1) and task-id accuracy (TA), outperforming baselines and maintaining compatibility with established replay strategies. Overall, H2ST provides a threshold-free, scalable solution for continual OOD detection in open-world TIL, facilitating safer and more reliable deployment in non-stationary environments.

Abstract

Task Incremental Learning (TIL) is a specialized form of Continual Learning (CL) in which a model incrementally learns from non-stationary data streams. Existing TIL methodologies operate under the closed-world assumption, presuming that incoming data remains in-distribution (ID). However, in an open-world setting, incoming samples may originate from out-of-distribution (OOD) sources, with their task identities inherently unknown. Continually detecting OOD samples presents several challenges for current OOD detection methods: reliance on model outputs leads to excessive dependence on model performance, selecting suitable thresholds is difficult, hindering real-world deployment, and binary ID/OOD classification fails to provide task-level identification. To address these issues, we propose a novel continual OOD detection method called the Hierarchical Two-sample Tests (H2ST). H2ST eliminates the need for threshold selection through hypothesis testing and utilizes feature maps to better exploit model capabilities without excessive dependence on model performance. The proposed hierarchical architecture enables task-level detection with superior performance and lower overhead compared to non-hierarchical classifier two-sample tests. Extensive experiments and analysis validate the effectiveness of H2ST in open-world TIL scenarios and its superiority to the existing methods. Code is available at \href{https://github.com/YuhangLiuu/H2ST}{https://github.com/YuhangLiuu/H2ST}.

H2ST: Hierarchical Two-Sample Tests for Continual Out-of-Distribution Detection

TL;DR

Open-world Task Incremental Learning (TIL) requires detecting out-of-distribution (OOD) samples and identifying their corresponding tasks without predefined thresholds. The authors propose Hierarchical Two-Sample Tests (H2ST), a threshold-free approach that tests distributions at the feature level via a cascade of task-specific two-sample tests, enabling both OOD detection and task-id prediction with low overhead. H2ST integrates with replay-based TIL memory, updates online, and uses CP-based calibrated detection with an early-exit mechanism to assign ID and task-id efficiently. Comprehensive experiments across MNIST, SVHN, CIFAR-10/100, Mini-ImageNet, CoRe50, and Stream-51 demonstrate strong OOD detection (high F1) and task-id accuracy (TA), outperforming baselines and maintaining compatibility with established replay strategies. Overall, H2ST provides a threshold-free, scalable solution for continual OOD detection in open-world TIL, facilitating safer and more reliable deployment in non-stationary environments.

Abstract

Task Incremental Learning (TIL) is a specialized form of Continual Learning (CL) in which a model incrementally learns from non-stationary data streams. Existing TIL methodologies operate under the closed-world assumption, presuming that incoming data remains in-distribution (ID). However, in an open-world setting, incoming samples may originate from out-of-distribution (OOD) sources, with their task identities inherently unknown. Continually detecting OOD samples presents several challenges for current OOD detection methods: reliance on model outputs leads to excessive dependence on model performance, selecting suitable thresholds is difficult, hindering real-world deployment, and binary ID/OOD classification fails to provide task-level identification. To address these issues, we propose a novel continual OOD detection method called the Hierarchical Two-sample Tests (H2ST). H2ST eliminates the need for threshold selection through hypothesis testing and utilizes feature maps to better exploit model capabilities without excessive dependence on model performance. The proposed hierarchical architecture enables task-level detection with superior performance and lower overhead compared to non-hierarchical classifier two-sample tests. Extensive experiments and analysis validate the effectiveness of H2ST in open-world TIL scenarios and its superiority to the existing methods. Code is available at \href{https://github.com/YuhangLiuu/H2ST}{https://github.com/YuhangLiuu/H2ST}.

Paper Structure

This paper contains 21 sections, 7 equations, 8 figures, 11 tables, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of H2ST for continual OOD detection. The TIL model learns incrementally and then proceeds to the testing phase. The new samples will traverse the hierarchical architecture with early-exit upon ID identification or completion of all layers. Samples predicted as ID will be further inferred, and the OOD will be used as the training samples for the new task.
  • Figure 2: Performance comparison between H2ST and C2ST. Both exhibit comparable TIL performance, but the hierarchical architecture demonstrates superior OOD detection performance.
  • Figure 3: Performance sensitivity to depth. As tasks scale, H2ST delivers stable and superior detection performance.
  • Figure 4: Impact of memory size on performance metrics. All metrics improve with memory size, though OOD detection shows diminishing returns at larger sizes.
  • Figure 5: TIL and OOD detection performance of single C2ST and H2ST. H2ST demonstrates superior OOD detection performance.
  • ...and 3 more figures