An archaeological Catalog Collection Method Based on Large Vision-Language Models
Honglin Pang, Yi Chang, Tianjing Duan, Xi Yang
TL;DR
This work tackles the automated collection of archaeological catalogs, which are dispersed across numerous publications and contain images, descriptions, and excavation data. It introduces a three-module pipeline—Document Localization, Block Comprehension, and Block Matching—built around large vision-language models to localize blocks, extract structured attributes, and align multimodal information via foreign-key and distance-based matching. Experiments on Dabagou and Miaozigou pottery catalogs demonstrate improved accuracy over baselines and show the method’s model-agnostic robustness, with Claude 3.5 Sonnet achieving the highest reported performance among tested VLMs. The approach enables scalable, automated catalog assembly, supporting downstream tasks such as artifact classification, reconstruction, and visual question answering, and sets the stage for extending to broader artifact types and more sophisticated matching strategies.
Abstract
Archaeological catalogs, containing key elements such as artifact images, morphological descriptions, and excavation information, are essential for studying artifact evolution and cultural inheritance. These data are widely scattered across publications, requiring automated collection methods. However, existing Large Vision-Language Models (VLMs) and their derivative data collection methods face challenges in accurate image detection and modal matching when processing archaeological catalogs, making automated collection difficult. To address these issues, we propose a novel archaeological catalog collection method based on Large Vision-Language Models that follows an approach comprising three modules: document localization, block comprehension and block matching. Through practical data collection from the Dabagou and Miaozigou pottery catalogs and comparison experiments, we demonstrate the effectiveness of our approach, providing a reliable solution for automated collection of archaeological catalogs.
