ShiftedBronzes: Benchmarking and Analysis of Domain Fine-Grained Classification in Open-World Settings
Rixin Zhou, Honglin Pang, Qian Zhang, Ruihua Qi, Xi Yang, Chuntao Li
TL;DR
This paper tackles open-world fine-grained classification in archaeology by introducing ShiftedBronzes, a benchmark for bronze ware dating that introduces distribution shifts between in-distribution data (Ding/Gui) and seven OOD data types, plus transferred data to simulate realistic domain challenges. It evaluates six FGVC methods for dating and eighteen OOD detection methods across post-hoc, VLM-based, and generation-based families, showing that pre-trained Vision-Language Model approaches generally outperform others, with ID-like prompting in VLMs yielding robust results. The work reveals that domain-specific knowledge and the handling of subtle, domain-relevant distribution shifts are crucial for effective OOD detection and dating in this field, and provides nuanced insights into how different OOD strategies respond to specialized data. ShiftedBronzes offers a comprehensive resource for advancing archaeology-centric FGVC and domain-aware OOD detection research, with dataset and code to be released subsequently.
Abstract
In real-world applications across specialized domains, addressing complex out-of-distribution (OOD) challenges is a common and significant concern. In this study, we concentrate on the task of fine-grained bronze ware dating, a critical aspect in the study of ancient Chinese history, and developed a benchmark dataset named ShiftedBronzes. By extensively expanding the bronze Ding dataset, ShiftedBronzes incorporates two types of bronze ware data and seven types of OOD data, which exhibit distribution shifts commonly encountered in bronze ware dating scenarios. We conduct benchmarking experiments on ShiftedBronzes and five commonly used general OOD datasets, employing a variety of widely adopted post-hoc, pre-trained Vision Large Model (VLM)-based and generation-based OOD detection methods. Through analysis of the experimental results, we validate previous conclusions regarding post-hoc, VLM-based, and generation-based methods, while also highlighting their distinct behaviors on specialized datasets. These findings underscore the unique challenges of applying general OOD detection methods to domain-specific tasks such as bronze ware dating. We hope that the ShiftedBronzes benchmark provides valuable insights into both the field of bronze ware dating and the and the development of OOD detection methods. The dataset and associated code will be available later.
