Robust and Real-Time Bangladeshi Currency Recognition: A Dual-Stream MobileNet and EfficientNet Approach
Subreena, Mohammad Amzad Hossain, Mirza Raquib, Saydul Akbar Murad, Farida Siddiqi Prity, Muhammad Hanif, Nick Rahimi
TL;DR
We address the challenge of accessible, reliable Bangladeshi currency recognition for visually impaired users by proposing a hybrid MobileNetV3-Large and EfficientNetB0 feature extractor with an MLP classifier. The approach is evaluated across five progressively complex datasets with five-fold cross-validation, using seven performance metrics and explainable AI tools (LIME and SHAP). Results show high accuracy on controlled data and strong generalization to real-world conditions, with ROC-AUC near 1.0 and robust interpretability. The work demonstrates a practical, real-time recognition system suitable for resource-constrained devices, with strong potential for deployment in assistive technologies and future extensions to counterfeit detection and mobile-edge implementations.
Abstract
Accurate currency recognition is essential for assistive technologies, particularly for visually impaired individuals who rely on others to identify banknotes. This dependency puts them at risk of fraud and exploitation. To address these challenges, we first build a new Bangladeshi banknote dataset that includes both controlled and real-world scenarios, ensuring a more comprehensive and diverse representation. Next, to enhance the dataset's robustness, we incorporate four additional datasets, including public benchmarks, to cover various complexities and improve the model's generalization. To overcome the limitations of current recognition models, we propose a novel hybrid CNN architecture that combines MobileNetV3-Large and EfficientNetB0 for efficient feature extraction. This is followed by an effective multilayer perceptron (MLP) classifier to improve performance while keeping computational costs low, making the system suitable for resource-constrained devices. The experimental results show that the proposed model achieves 97.95% accuracy on controlled datasets, 92.84% on complex backgrounds, and 94.98% accuracy when combining all datasets. The model's performance is thoroughly evaluated using five-fold cross-validation and seven metrics: accuracy, precision, recall, F1-score, Cohen's Kappa, MCC, and AUC. Additionally, explainable AI methods like LIME and SHAP are incorporated to enhance transparency and interpretability.
