BanglaMM-Disaster: A Multimodal Transformer-Based Deep Learning Framework for Multiclass Disaster Classification in Bangla
Ariful Islam, Md Rifat Hossen, Md. Mahmudul Arif, Abdullah Al Noman, Md Arifur Rahman
TL;DR
BanglaMM-Disaster tackles real-time disaster classification for Bangla social media by fusing textual descriptions with corresponding images. It proposes an end-to-end multimodal pipeline that couples transformer-based text encoders with CNN visual features via early fusion, evaluated on a new 5,037-sample, 9-class Bangla disaster dataset. The best configuration achieves 83.76% accuracy, surpassing text-only and image-only baselines by 3.84% and 16.91%, respectively, and exhibits cross-modal benefits across disaster categories. The work demonstrates practical potential for rapid disaster monitoring in low-resource settings and suggests future enhancement with attention-based fusion and graph-based cross-modal reasoning.
Abstract
Natural disasters remain a major challenge for Bangladesh, so real-time monitoring and quick response systems are essential. In this study, we present BanglaMM-Disaster, an end-to-end deep learning-based multimodal framework for disaster classification in Bangla, using both textual and visual data from social media. We constructed a new dataset of 5,037 Bangla social media posts, each consisting of a caption and a corresponding image, annotated into one of nine disaster-related categories. The proposed model integrates transformer-based text encoders, including BanglaBERT, mBERT, and XLM-RoBERTa, with CNN backbones such as ResNet50, DenseNet169, and MobileNetV2, to process the two modalities. Using early fusion, the best model achieves 83.76% accuracy. This surpasses the best text-only baseline by 3.84% and the image-only baseline by 16.91%. Our analysis also shows reduced misclassification across all classes, with noticeable improvements for ambiguous examples. This work fills a key gap in Bangla multimodal disaster analysis and demonstrates the benefits of combining multiple data types for real-time disaster response in low-resource settings.
