Table of Contents
Fetching ...

DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models

Futian Wang, Chaoliu Weng, Xiao Wang, Zhen Chen, Zhicheng Zhao, Jin Tang

TL;DR

DialBench tackles the critical problem of accurate pointer-meter readings in industrial settings by introducing the RPM-10K benchmark and a vision–language model MRLM that employs Physical Relation Injection. The approach integrates three hierarchical components—Key Feature Mining, Mixture-of-Experts, and language-labeled supervision—to ground perception in physical relationships, adapt to diverse meter types, and produce end-to-end numeric readings. Extensive experiments show MRLM achieving state-of-the-art accuracy and robustness on RPM-10K, outperforming open baselines and establishing a strong baseline for industrial dial-reading research. The work provides a scalable dataset, standardized evaluation protocols, and a practical framework for physics-aware multi-modal meter reading applicable to real-world smart power and industrial systems.

Abstract

The precise reading recognition of pointer meters plays a key role in smart power systems, but existing approaches remain fragile due to challenges like reflections, occlusions, dynamic viewing angles, and overly between thin pointers and scale markings. Up to now, this area still lacks large-scale datasets to support the development of robust algorithms. To address these challenges, this paper first presents a new large-scale benchmark dataset for dial reading, termed RPM-10K, which contains 10730 meter images that fully reflect the aforementioned key challenges. Built upon the dataset, we propose a novel vision-language model for pointer meter reading recognition, termed MRLM, based on physical relation injection. Instead of exhaustively learning image-level correlations, MRLM explicitly encodes the geometric and causal relationships between the pointer and the scale, aligning perception with physical reasoning in the spirit of world-model perspectives. Through cross-attentional fusion and adaptive expert selection, the model learns to interpret dial configurations and generate precise numeric readings. Extensive experiments fully validated the effectiveness of our proposed framework on the newly proposed benchmark dataset. Both the dataset and source code will be released on https://github.com/Event-AHU/DialBench

DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models

TL;DR

DialBench tackles the critical problem of accurate pointer-meter readings in industrial settings by introducing the RPM-10K benchmark and a vision–language model MRLM that employs Physical Relation Injection. The approach integrates three hierarchical components—Key Feature Mining, Mixture-of-Experts, and language-labeled supervision—to ground perception in physical relationships, adapt to diverse meter types, and produce end-to-end numeric readings. Extensive experiments show MRLM achieving state-of-the-art accuracy and robustness on RPM-10K, outperforming open baselines and establishing a strong baseline for industrial dial-reading research. The work provides a scalable dataset, standardized evaluation protocols, and a practical framework for physics-aware multi-modal meter reading applicable to real-world smart power and industrial systems.

Abstract

The precise reading recognition of pointer meters plays a key role in smart power systems, but existing approaches remain fragile due to challenges like reflections, occlusions, dynamic viewing angles, and overly between thin pointers and scale markings. Up to now, this area still lacks large-scale datasets to support the development of robust algorithms. To address these challenges, this paper first presents a new large-scale benchmark dataset for dial reading, termed RPM-10K, which contains 10730 meter images that fully reflect the aforementioned key challenges. Built upon the dataset, we propose a novel vision-language model for pointer meter reading recognition, termed MRLM, based on physical relation injection. Instead of exhaustively learning image-level correlations, MRLM explicitly encodes the geometric and causal relationships between the pointer and the scale, aligning perception with physical reasoning in the spirit of world-model perspectives. Through cross-attentional fusion and adaptive expert selection, the model learns to interpret dial configurations and generate precise numeric readings. Extensive experiments fully validated the effectiveness of our proposed framework on the newly proposed benchmark dataset. Both the dataset and source code will be released on https://github.com/Event-AHU/DialBench

Paper Structure

This paper contains 23 sections, 9 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Visualization of dials under diverse environmental conditions, including low_light, high_exposure, missing_part_info, mirror_pollution, tilted, blur, occlusion, and mirror_reflection. These conditions reflect practical challenges for recognition and detection tasks.
  • Figure 2: Illustration of the dial types in the dataset. Subfigures (a)–(f) represent the six primary dial categories, whereas (g)–(p) correspond to samples acquired from online sources.
  • Figure 3: Distribution of samples across different dial configurations and environmental conditions.
  • Figure 4: Overview of the proposed MeterReading Large Model (MRLM) based on the Physical Relation Injection (PRI) framework. The pipeline sequentially injects physical relations at three hierarchical levels: entity grounding (KFM), relational coupling (MoE), and symbolic alignment (language-labeled supervision).
  • Figure 5: Visualization of model prediction results on representative test samples.