Table of Contents
Fetching ...

MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models

Beibei Yu, Tao Shen, Hongbin Na, Ling Chen, Denqi Li

TL;DR

This work tackles the automation gap in remote-sensing mineral exploration by introducing MineAgent, a modular MLLM framework that uses hierarchical judging and task-specific decision-making to reason across multiple images and spectral data. Complemented by MineBench, a domain-specific benchmark with geological and hyperspectral inputs, the approach enables rigorous evaluation of MLLMs in mineral exploration tasks. Across extensive experiments, MineAgent substantially improves multi-image reasoning (up to ~30% gains) but also reveals a performance ceiling and notable gaps between open-source models and closed systems like GPT-4o, driven by limited domain data and complex reasoning demands. The study demonstrates the value of structured reasoning and domain grounding for practical mineral prospectivity assessment while outlining future work in integrating domain knowledge bases and reinforcement learning to further enhance robustness and generalization.

Abstract

Remote-sensing mineral exploration is critical for identifying economically viable mineral deposits, yet it poses significant challenges for multimodal large language models (MLLMs). These include limitations in domain-specific geological knowledge and difficulties in reasoning across multiple remote-sensing images, further exacerbating long-context issues. To address these, we present MineAgent, a modular framework leveraging hierarchical judging and decision-making modules to improve multi-image reasoning and spatial-spectral integration. Complementing this, we propose MineBench, a benchmark specific for evaluating MLLMs in domain-specific mineral exploration tasks using geological and hyperspectral data. Extensive experiments demonstrate the effectiveness of MineAgent, highlighting its potential to advance MLLMs in remote-sensing mineral exploration.

MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models

TL;DR

This work tackles the automation gap in remote-sensing mineral exploration by introducing MineAgent, a modular MLLM framework that uses hierarchical judging and task-specific decision-making to reason across multiple images and spectral data. Complemented by MineBench, a domain-specific benchmark with geological and hyperspectral inputs, the approach enables rigorous evaluation of MLLMs in mineral exploration tasks. Across extensive experiments, MineAgent substantially improves multi-image reasoning (up to ~30% gains) but also reveals a performance ceiling and notable gaps between open-source models and closed systems like GPT-4o, driven by limited domain data and complex reasoning demands. The study demonstrates the value of structured reasoning and domain grounding for practical mineral prospectivity assessment while outlining future work in integrating domain knowledge bases and reinforcement learning to further enhance robustness and generalization.

Abstract

Remote-sensing mineral exploration is critical for identifying economically viable mineral deposits, yet it poses significant challenges for multimodal large language models (MLLMs). These include limitations in domain-specific geological knowledge and difficulties in reasoning across multiple remote-sensing images, further exacerbating long-context issues. To address these, we present MineAgent, a modular framework leveraging hierarchical judging and decision-making modules to improve multi-image reasoning and spatial-spectral integration. Complementing this, we propose MineBench, a benchmark specific for evaluating MLLMs in domain-specific mineral exploration tasks using geological and hyperspectral data. Extensive experiments demonstrate the effectiveness of MineAgent, highlighting its potential to advance MLLMs in remote-sensing mineral exploration.

Paper Structure

This paper contains 48 sections, 15 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Judgment comparisons between GPT-4o chatgpt4o and human evaluator. GPT-4o in blue box and human-annotation in red box. In (b), yellow boxes highlight regions and their spatial relations identified by the human but not GPT-4o.
  • Figure 2: Task definition in MineBench. Particularly, a targeted area $a$ is represented by two image types, i.e., ${\mathcal{I}}_a=\{{\mathcal{I}}_a^{\text{(g)}}, {\mathcal{I}}_a^{\text{(h)}}\}$. ${\mathcal{I}}_a^{\text{(h)}}$ are color-coded images where uncolored regions represent sub-threshold potential.
  • Figure 3: The tailored MineAgent for mineral exploration. (Left) Base pipeline using step-by-step reasoning; (Right) MineAgent decomposing pipeline into specialized modules, improving assessment accuracy.
  • Figure 4: A general framework of MineAgent.
  • Figure 5: Performance across varying complexity levels
  • ...and 12 more figures