MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models
Beibei Yu, Tao Shen, Hongbin Na, Ling Chen, Denqi Li
TL;DR
This work tackles the automation gap in remote-sensing mineral exploration by introducing MineAgent, a modular MLLM framework that uses hierarchical judging and task-specific decision-making to reason across multiple images and spectral data. Complemented by MineBench, a domain-specific benchmark with geological and hyperspectral inputs, the approach enables rigorous evaluation of MLLMs in mineral exploration tasks. Across extensive experiments, MineAgent substantially improves multi-image reasoning (up to ~30% gains) but also reveals a performance ceiling and notable gaps between open-source models and closed systems like GPT-4o, driven by limited domain data and complex reasoning demands. The study demonstrates the value of structured reasoning and domain grounding for practical mineral prospectivity assessment while outlining future work in integrating domain knowledge bases and reinforcement learning to further enhance robustness and generalization.
Abstract
Remote-sensing mineral exploration is critical for identifying economically viable mineral deposits, yet it poses significant challenges for multimodal large language models (MLLMs). These include limitations in domain-specific geological knowledge and difficulties in reasoning across multiple remote-sensing images, further exacerbating long-context issues. To address these, we present MineAgent, a modular framework leveraging hierarchical judging and decision-making modules to improve multi-image reasoning and spatial-spectral integration. Complementing this, we propose MineBench, a benchmark specific for evaluating MLLMs in domain-specific mineral exploration tasks using geological and hyperspectral data. Extensive experiments demonstrate the effectiveness of MineAgent, highlighting its potential to advance MLLMs in remote-sensing mineral exploration.
