DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models
Futian Wang, Chaoliu Weng, Xiao Wang, Zhen Chen, Zhicheng Zhao, Jin Tang
TL;DR
DialBench tackles the critical problem of accurate pointer-meter readings in industrial settings by introducing the RPM-10K benchmark and a vision–language model MRLM that employs Physical Relation Injection. The approach integrates three hierarchical components—Key Feature Mining, Mixture-of-Experts, and language-labeled supervision—to ground perception in physical relationships, adapt to diverse meter types, and produce end-to-end numeric readings. Extensive experiments show MRLM achieving state-of-the-art accuracy and robustness on RPM-10K, outperforming open baselines and establishing a strong baseline for industrial dial-reading research. The work provides a scalable dataset, standardized evaluation protocols, and a practical framework for physics-aware multi-modal meter reading applicable to real-world smart power and industrial systems.
Abstract
The precise reading recognition of pointer meters plays a key role in smart power systems, but existing approaches remain fragile due to challenges like reflections, occlusions, dynamic viewing angles, and overly between thin pointers and scale markings. Up to now, this area still lacks large-scale datasets to support the development of robust algorithms. To address these challenges, this paper first presents a new large-scale benchmark dataset for dial reading, termed RPM-10K, which contains 10730 meter images that fully reflect the aforementioned key challenges. Built upon the dataset, we propose a novel vision-language model for pointer meter reading recognition, termed MRLM, based on physical relation injection. Instead of exhaustively learning image-level correlations, MRLM explicitly encodes the geometric and causal relationships between the pointer and the scale, aligning perception with physical reasoning in the spirit of world-model perspectives. Through cross-attentional fusion and adaptive expert selection, the model learns to interpret dial configurations and generate precise numeric readings. Extensive experiments fully validated the effectiveness of our proposed framework on the newly proposed benchmark dataset. Both the dataset and source code will be released on https://github.com/Event-AHU/DialBench
