Table of Contents
Fetching ...

Towards Automated Petrography

Isai Daniel Chacón, Paola Ruiz Puentes, Jillian Pearse, Pablo Arbeláez

TL;DR

This work introduces LITHOS, the largest publicly available benchmark for automated petrography, combining paired PPL and XPL images with over 100k grain annotations across 25 mineral classes. It proposes a dual-encoder transformer baseline that fuses features from both polarizations, demonstrating superior mineral classification performance over single-polarization models on both binary and multi-class tasks. The dataset comprises 580 thin sections and 211,604 high-resolution patches, with detailed grain-axes annotations providing weak supervision for instance-level learning. By releasing the dataset, code, and pretrained models, the work aims to advance reproducibility and foster interdisciplinary research in automated petrographic analysis, enabling scalable mineral identification and texture characterization in geological samples.

Abstract

Petrography is a branch of geology that analyzes the mineralogical composition of rocks from microscopical thin section samples. It is essential for understanding rock properties across geology, archaeology, engineering, mineral exploration, and the oil industry. However, petrography is a labor-intensive task requiring experts to conduct detailed visual examinations of thin section samples through optical polarization microscopes, thus hampering scalability and highlighting the need for automated techniques. To address this challenge, we introduce the Large-scale Imaging and Thin section Optical-polarization Set (LITHOS), the largest and most diverse publicly available experimental framework for automated petrography. LITHOS includes 211,604 high-resolution RGB patches of polarized light and 105,802 expert-annotated grains across 25 mineral categories. Each annotation consists of the mineral class, spatial coordinates, and expert-defined major and minor axes represented as intersecting vector paths, capturing grain geometry and orientation. We evaluate multiple deep learning techniques for mineral classification in LITHOS and propose a dual-encoder transformer architecture that integrates both polarization modalities as a strong baseline for future reference. Our method consistently outperforms single-polarization models, demonstrating the value of polarization synergy in mineral classification. We have made the LITHOS Benchmark publicly available, comprising our dataset, code, and pretrained models, to foster reproducibility and further research in automated petrographic analysis.

Towards Automated Petrography

TL;DR

This work introduces LITHOS, the largest publicly available benchmark for automated petrography, combining paired PPL and XPL images with over 100k grain annotations across 25 mineral classes. It proposes a dual-encoder transformer baseline that fuses features from both polarizations, demonstrating superior mineral classification performance over single-polarization models on both binary and multi-class tasks. The dataset comprises 580 thin sections and 211,604 high-resolution patches, with detailed grain-axes annotations providing weak supervision for instance-level learning. By releasing the dataset, code, and pretrained models, the work aims to advance reproducibility and foster interdisciplinary research in automated petrographic analysis, enabling scalable mineral identification and texture characterization in geological samples.

Abstract

Petrography is a branch of geology that analyzes the mineralogical composition of rocks from microscopical thin section samples. It is essential for understanding rock properties across geology, archaeology, engineering, mineral exploration, and the oil industry. However, petrography is a labor-intensive task requiring experts to conduct detailed visual examinations of thin section samples through optical polarization microscopes, thus hampering scalability and highlighting the need for automated techniques. To address this challenge, we introduce the Large-scale Imaging and Thin section Optical-polarization Set (LITHOS), the largest and most diverse publicly available experimental framework for automated petrography. LITHOS includes 211,604 high-resolution RGB patches of polarized light and 105,802 expert-annotated grains across 25 mineral categories. Each annotation consists of the mineral class, spatial coordinates, and expert-defined major and minor axes represented as intersecting vector paths, capturing grain geometry and orientation. We evaluate multiple deep learning techniques for mineral classification in LITHOS and propose a dual-encoder transformer architecture that integrates both polarization modalities as a strong baseline for future reference. Our method consistently outperforms single-polarization models, demonstrating the value of polarization synergy in mineral classification. We have made the LITHOS Benchmark publicly available, comprising our dataset, code, and pretrained models, to foster reproducibility and further research in automated petrographic analysis.

Paper Structure

This paper contains 12 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Example of 256 × 256 high-resolution image patches extracted from the LITHOS Dataset, illustrating the 25 mineral classes under the two polarization conditions: plane-polarized light (PPL) at 0° and cross-polarized light (XPL) at 0°. Each patch represents an area of $896 \mu m^2$. These paired images highlight the variation in color, texture, and birefringence patterns, which are critical for mineral identification in thin section petrography.
  • Figure 2: Example of a digitized thin section under polarized light. (A) Plane-polarized light (PPL) at 0°. (B) Cross-polarized light (XPL) at 0°.
  • Figure 3: Distribution of annotated minerals across the two folds in the binary task.
  • Figure 4: Distribution of annotated minerals per class across the two folds. The distribution is imbalanced, with a few dominant classes such as Monocrystalline, Rock Fragment, and Polycrystalline accounting for the majority of annotations. This long-tailed distribution reflects the natural occurrence of minerals in thin sections and poses a significant challenge for learning robust classification models, particularly for rare classes.
  • Figure 5: Overview of the LITHOS Baseline. Two frozen ViT encoders extract specific polarized representations from petrographic images at PPL and XPL. A dual-decoder module captures feature dependencies through self-attention and cross attention mechanisms. These features are then fused together via a learnable weighted sum. Lastly, the [CLS] token of the combined representation is passed through the classification head of the model. FFN stands for Feed-Forward Network.
  • ...and 1 more figures