Table of Contents
Fetching ...

TV100: A TV Series Dataset that Pre-Trained CLIP Has Not Seen

Da-Wei Zhou, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan

TL;DR

This work questions whether large pre-trained models truly possess comprehensive knowledge by evaluating CLIP on a newly proposed post-2021 TV-series image dataset. The TV100 dataset is built by collecting TV series from IMDb released after 2021, gathering images via Google, deduplicating, and using a pre-trained CLIP to rank ~800 candidate classes by zero-shot accuracy of 'a photo of the TV series [CLASS]', selecting the top-100 hard classes. Zero-shot CLIP performance on TV100 is essentially zero, while finetuning yields strong improvements, indicating the dataset is learnable and suitable for assessing incremental learning and novel class discovery. The dataset is globally diverse and long-tailed, enabling evaluation of out-of-distribution recognition and transfer to downstream tasks, with public access at the project page.

Abstract

The era of pre-trained models has ushered in a wealth of new insights for the machine learning community. Among the myriad of questions that arise, one of paramount importance is: 'Do pre-trained models possess comprehensive knowledge?' This paper seeks to address this crucial inquiry. In line with our objective, we have made publicly available a novel dataset comprised of images from TV series released post-2021. This dataset holds significant potential for use in various research areas, including the evaluation of incremental learning, novel class discovery, and long-tailed learning, among others. Project page: https://tv-100.github.io/

TV100: A TV Series Dataset that Pre-Trained CLIP Has Not Seen

TL;DR

This work questions whether large pre-trained models truly possess comprehensive knowledge by evaluating CLIP on a newly proposed post-2021 TV-series image dataset. The TV100 dataset is built by collecting TV series from IMDb released after 2021, gathering images via Google, deduplicating, and using a pre-trained CLIP to rank ~800 candidate classes by zero-shot accuracy of 'a photo of the TV series [CLASS]', selecting the top-100 hard classes. Zero-shot CLIP performance on TV100 is essentially zero, while finetuning yields strong improvements, indicating the dataset is learnable and suitable for assessing incremental learning and novel class discovery. The dataset is globally diverse and long-tailed, enabling evaluation of out-of-distribution recognition and transfer to downstream tasks, with public access at the project page.

Abstract

The era of pre-trained models has ushered in a wealth of new insights for the machine learning community. Among the myriad of questions that arise, one of paramount importance is: 'Do pre-trained models possess comprehensive knowledge?' This paper seeks to address this crucial inquiry. In line with our objective, we have made publicly available a novel dataset comprised of images from TV series released post-2021. This dataset holds significant potential for use in various research areas, including the evaluation of incremental learning, novel class discovery, and long-tailed learning, among others. Project page: https://tv-100.github.io/
Paper Structure (2 sections, 1 figure)

This paper contains 2 sections, 1 figure.

Figures (1)

  • Figure 1: Detailed information about TV100, including the data collection process, the country distribution, and class distribution. It also contains an empirical evaluation of zero-shot and finetuned performance.