LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

Jing Zhang; Irving Fang; Juexiao Zhang; Hao Wu; Akshat Kaushik; Alice Rodriguez; Hanwen Zhao; Zhuo Zheng; Radu Iovita; Chen Feng

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

Jing Zhang, Irving Fang, Juexiao Zhang, Hao Wu, Akshat Kaushik, Alice Rodriguez, Hanwen Zhao, Zhuo Zheng, Radu Iovita, Chen Feng

TL;DR

LUWA tackles lithic use-wear analysis on microscopic images, a niche vision problem with irregular wear patterns and variable imaging modalities. The authors assemble the LUWA dataset with 23,130 images across multiple magnifications and modalities, and benchmark a broad set of pre-trained models, revealing that DINOv2 offers the most robust generalization while humans struggle due to dataset scarcity. They also explore few-shot learning and AI–archaeologist collaboration through prompts and GPT-4V experiments, highlighting both potential and current limits of large models in specialized domains. The work provides a new benchmark for image classification beyond common objects and offers concrete guidance on magnification and modality choices for lithic wear analysis.

Abstract

Lithic Use-Wear Analysis (LUWA) using microscopic images is an underexplored vision-for-science research area. It seeks to distinguish the worked material, which is critical for understanding archaeological artifacts, material interactions, tool functionalities, and dental records. However, this challenging task goes beyond the well-studied image classification problem for common objects. It is affected by many confounders owing to the complex wear mechanism and microscopic imaging, which makes it difficult even for human experts to identify the worked material successfully. In this paper, we investigate the following three questions on this unique vision task for the first time:(i) How well can state-of-the-art pre-trained models (like DINOv2) generalize to the rarely seen domain? (ii) How can few-shot learning be exploited for scarce microscopic images? (iii) How do the ambiguous magnification and sensing modality influence the classification accuracy? To study these, we collaborated with archaeologists and built the first open-source and the largest LUWA dataset containing 23,130 microscopic images with different magnifications and sensing modalities. Extensive experiments show that existing pre-trained models notably outperform human experts but still leave a large gap for improvements. Most importantly, the LUWA dataset provides an underexplored opportunity for vision and learning communities and complements existing image classification problems on common objects.

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

TL;DR

Abstract

Paper Structure (18 sections, 9 figures, 7 tables)

This paper contains 18 sections, 9 figures, 7 tables.

Introduction
Related Work
LUWA Dataset
Dataset Creation
Dataset Analysis
Algorithm Benchmarking
Fully-Supervised Image Classification
Few-Shot Image Classfication
Impact and Limitations of LUWA Dataset
Conclusion
LUWA Dataset
Dataset Fidelity
Material Properties
Human Annotations
Algorithm Benchmarking
...and 3 more sections

Figures (9)

Figure 1: LUWA poses a unique computer vision challenge due to: its complex wear formation and irregular wear patterns, ambiguous sensing modalities and magnifications in microscopic imaging. Facing these challenges, the LUWA dataset encompasses both texture and heightmap with different magnifications, encouraging the exploration of image classification beyond common objects.
Figure 2: Image diversity of LUWA dataset and corresponding visual explanations for human and model decision-making processes.(i) LUWA dataset provides diverse microscopic images associated with spatial distributions (e.g. Regions 1 and 2), magnifications (e.g. Regions 2 and 4) and sensing modalities (texture in the first row and heightmap in the second row); (ii) We compared visual explanations in both human (in the third row) and model (in the fourth row) decision-making processes. Human experts labeled the most important region with red and the less important region with yellow when looking at details of microscopic images to distinguish the worked material. Similarly, Grad-CAM selvaraju2017grad heatmaps use red for the highest importance, yellow for lower importance, and blue for the lowest importance. Interestingly, similar areas (e.g. Regions 1, 4 and 6) are labeled with higher importance for both humans and models.
Figure 3: Cosine similarity distribution of LUWA dataset on different magnifications and sensing modalities.
Figure 4: The impact of the training strategy, granularity, magnification, and sensing modality on top-1 classification accuracy in %: (a) Due to their huge parameter counts, the experiments do not include full-parameter fine-tuned DINOv2, and ViT-H and DINOv2 trained from scratch. (b) Larger numbers in granularity mean more detailed information about a use-wear is fed into the model.
Figure 5: Feature visualization of LUWA dataset using frozen pre-trained DINOv2.
...and 4 more figures

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

TL;DR

Abstract

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

Authors

TL;DR

Abstract

Table of Contents

Figures (9)