From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web

Ameya Prabhu; Hasan Abed Al Kader Hammoud; Ser-Nam Lim; Bernard Ghanem; Philip H. S. Torr; Adel Bibi

From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web

Ameya Prabhu, Hasan Abed Al Kader Hammoud, Ser-Nam Lim, Bernard Ghanem, Philip H. S. Torr, Adel Bibi

TL;DR

The paper tackles the high cost of data labeling in continual learning by proposing name-only continual learning (NO-CL), which relies on uncurated webly-supervised data collected via simple queries (Category Name + Auxiliary Suffix). The core method, C2C, uses a fixed backbone to extract features from web-sourced data and trains a lightweight classifier under constrained computational budgets, enabling rapid, scalable adaptation across timesteps. Empirical results show web data can match or surpass manually annotated training in fine-grained tasks, with 2–25% absolute gains over prior name-only methods and strong performance in class-, domain-, and time-incremental settings; EvoTrends demonstrates real-world applicability by tracking evolving trends over 21 years. The work highlights substantial reductions in annotation time and cost (minutes, <$15 on AWS) while maintaining competitive accuracy, and introduces EvoTrends as a practical continual-name-only benchmark for evaluating adaptation to real-world trends. Overall, the approach reveals the viability of webly-supervised data as a robust substitute for manual labeling in continual learning and points to further extensions such as test-time adaptation and online continual learning.

Abstract

Continual Learning (CL) often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice. We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation. In this scenario, learners adapt to new category shifts using only category names without the luxury of annotated training data. Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification. We investigate the reliability of our web data and find them comparable, and in some cases superior, to manually annotated datasets. Additionally, we show that by harnessing the web, we can create support sets that surpass state-of-the-art name-only classification that create support sets using generative models or image retrieval from LAION-5B, achieving up to 25% boost in accuracy. When applied across varied continual learning contexts, our method consistently exhibits a small performance gap in comparison to models trained on manually annotated datasets. We present EvoTrends, a class-incremental dataset made from the web to capture real-world trends, created in just minutes. Overall, this paper underscores the potential of using uncurated webly-supervised data to mitigate the challenges associated with manual data labeling in continual learning.

From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web

TL;DR

Abstract

Paper Structure (22 sections, 6 figures, 17 tables)

This paper contains 22 sections, 6 figures, 17 tables.

Introduction
Problem Formulation
Our Approach: Categories to Classifier
Evaluating Capabilities of our Approach
Experimental Details
How Reliable is our Proposed Approach?
Comparison with Name-Only Approaches
Continual Webly-Supervised Learning
Experimental Details
Results
EvoTrends: The First Continual Name-Only Classification Benchmark
Conclusion
Note on Copyright
Acknowledgements
Connections to Past Literature in Webly Supervised Learning
...and 7 more sections

Figures (6)

Figure 1: Continual Name-Only Classification: Our Approach. At each timestep $t$, the learner receives a list of class categories without any training samples. We start by collecting webly-supervised data through querying and downloading data from multiple search engines. We then extract features using a frozen backbone, and subsequently train a linear layer on those features. The same process is repeated for the next timestep.
Figure 2: EvoTrends: A Dynamic Dataset Reflecting Real-world Trends. This illustration showcases the dataset we have curated using internet sources. EvoTrends consists of 21 timesteps, spanning the years 2000 to 2020. Each timestep presents the most trending products of the respective year, challenging the learner to adapt to these evolving trends. Unlike artificial scenarios, this dataset accurately reflects a real class-incremental setting, where classes emerge based on actual trends observed in the world.
Figure 3: Effective Training Epochs Per Time Step. Each row represents one of the computational budgets: tight, normal, and relaxed in sequential order. The normal budget is carefully chosen to allow the manually annotated datasets to undergo one epoch of training during the initial timestep (depicted by the green plot in the left side figures). The tight budget (blue) is half the budget of normal, while the relaxed budget (yellow) is four times the budget of normal. As the webly-supervised data surpasses the manually curated datasets in size, they undergo fewer epochs within each of the three budget regimes. At each timestep more data is presented, hence the effective number of epochs decays with time.
Figure 4: Domain Gap in CLEAR10 Dataset. In the CLEAR10 dataset, the original paper describes buses as the "exterior" of the bus, while the test set predominantly consists of images showcasing the "interior" of the bus. In contrast, our web-collected data primarily comprises "exterior" photos of the buses. Considering the inherent dissimilarity between our buses and the test set, this justifies the 10% performance gap observed when using our webly-supervised data compared to manually annotated datasets. It is important to note that this discrepancy does not apply to other classes within CLEAR10, as demonstrated by the camera class for reference.
Figure 5: Comparison of PACS and Webly-Supervised Data. The upper two rows depict the sketch and painting domains of the manually annotated PACS dataset, while the last two rows showcase the sketch and painting domains of our webly-supervised data. Although there is some resemblance between the painting domains of both datasets, a significant domain gap becomes apparent when comparing the sketches. In the PACS dataset, sketches refer to quick drawings, whereas in our web search, sketches are composed of line drawings and detailed sketches.
...and 1 more figures

From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web

TL;DR

Abstract

From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web

Authors

TL;DR

Abstract

Table of Contents

Figures (6)