Table of Contents
Fetching ...

Test-Time Discovery via Hashing Memory

Fan Lyu, Tianle Liu, Zhang Zhang, Fuyuan Hu, Liang Wang

TL;DR

Test-Time Discovery (TTD) tackles class shifts at inference by requiring models to classify known categories and concurrently discover novel ones. The authors introduce a training-free Hash Memory (HM) that uses Locality-Sensitive Hashing to store and compare test features against past samples, coupled with a global prototype classifier for known classes and an LSH-based local classifier for unknowns, plus a memory self-correction mechanism. Empirical results on CIFAR100D, CUB-200D, and Tiny-ImageNetD show HM achieves robust novel-category discovery while maintaining known-class accuracy, outperforming threshold-based and some training-based baselines in real-time and post-evaluation metrics. This work offers a practical, open-world evaluation framework with meaningful implications for safety-critical systems and continuous learning at test time.

Abstract

We introduce Test-Time Discovery (TTD) as a novel task that addresses class shifts during testing, requiring models to simultaneously identify emerging categories while preserving previously learned ones. A key challenge in TTD is distinguishing newly discovered classes from those already identified. To address this, we propose a training-free, hash-based memory mechanism that enhances class discovery through fine-grained comparisons with past test samples. Leveraging the characteristics of unknown classes, our approach introduces hash representation based on feature scale and directions, utilizing Locality-Sensitive Hashing (LSH) for efficient grouping of similar samples. This enables test samples to be easily and quickly compared with relevant past instances. Furthermore, we design a collaborative classification strategy, combining a prototype classifier for known classes with an LSH-based classifier for novel ones. To enhance reliability, we incorporate a self-correction mechanism that refines memory labels through hash-based neighbor retrieval, ensuring more stable and accurate class assignments. Experimental results demonstrate that our method achieves good discovery of novel categories while maintaining performance on known classes, establishing a new paradigm in model testing. Our code is available at https://github.com/fanlyu/ttd.

Test-Time Discovery via Hashing Memory

TL;DR

Test-Time Discovery (TTD) tackles class shifts at inference by requiring models to classify known categories and concurrently discover novel ones. The authors introduce a training-free Hash Memory (HM) that uses Locality-Sensitive Hashing to store and compare test features against past samples, coupled with a global prototype classifier for known classes and an LSH-based local classifier for unknowns, plus a memory self-correction mechanism. Empirical results on CIFAR100D, CUB-200D, and Tiny-ImageNetD show HM achieves robust novel-category discovery while maintaining known-class accuracy, outperforming threshold-based and some training-based baselines in real-time and post-evaluation metrics. This work offers a practical, open-world evaluation framework with meaningful implications for safety-critical systems and continuous learning at test time.

Abstract

We introduce Test-Time Discovery (TTD) as a novel task that addresses class shifts during testing, requiring models to simultaneously identify emerging categories while preserving previously learned ones. A key challenge in TTD is distinguishing newly discovered classes from those already identified. To address this, we propose a training-free, hash-based memory mechanism that enhances class discovery through fine-grained comparisons with past test samples. Leveraging the characteristics of unknown classes, our approach introduces hash representation based on feature scale and directions, utilizing Locality-Sensitive Hashing (LSH) for efficient grouping of similar samples. This enables test samples to be easily and quickly compared with relevant past instances. Furthermore, we design a collaborative classification strategy, combining a prototype classifier for known classes with an LSH-based classifier for novel ones. To enhance reliability, we incorporate a self-correction mechanism that refines memory labels through hash-based neighbor retrieval, ensuring more stable and accurate class assignments. Experimental results demonstrate that our method achieves good discovery of novel categories while maintaining performance on known classes, establishing a new paradigm in model testing. Our code is available at https://github.com/fanlyu/ttd.

Paper Structure

This paper contains 33 sections, 21 equations, 17 figures, 9 tables, 1 algorithm.

Figures (17)

  • Figure 1: Test-Time Discovery (TTD). A model is initially trained on data containing only known classes. During deployment, the test data may include both known and unknown classes, requiring the model to predict known classes and discover novel ones. Given a test sample, the model needs to determine whether it belongs to a newly discovered unseen class or an already identified seen class. The small scale of the discovered data and insufficient learning make this distinction challenging.
  • Figure 2: Class-cluster prediction matching matrix on MNIST ($5+5$). For each prediction cluster, we compute the class composition and visualize the relationships. To enhance clarity, clusters with the highest class proportion are aligned diagonally from class 0. The visualization shows that traditional thresholding causes significant confusion as new classes emerge, with samples scattered across multiple clusters and reduced discovery performance.
  • Figure 3: Schema of the Proposed Method: (a) Global-to-local classification: Samples with high confidence are classified using the prototype classifier; otherwise, the LSH-based classifier is used. (b) Constructed hash memory: Samples are stored in different buckets based on hashed feature norm and direction. Sparse buckets are more likely to contain novel samples. (c) LSH-based classifier: A test sample is first hashed to locate its target bucket. A graph-based neighbor search then explores adjacent buckets for a more robust prediction. If no similar samples are found, the sample is identified as a new class; otherwise, it is assigned the most relevant label from its neighbors.
  • Figure 4: Post evaluation using NCD metrics.
  • Figure 5: Memory agreement w/ and w/o SC.
  • ...and 12 more figures