Towards Sustainable Development: A Novel Integrated Machine Learning Model for Holistic Environmental Health Monitoring
Anirudh Mazumder, Sarthak Engala, Aditya Nallaparaju
TL;DR
The paper tackles fragmented environmental monitoring by urbanization, proposing a novel integrated ML pipeline to assess holistic environmental health through fusing air and water quality indicators across states. It employs a stacking ensemble of Random Forest, Support Vector, and Logistic Regression, trained on labeled city data with Pearson-based feature selection to identify key predictors such as fecal coliform, SPM, and BOD, delivering a single holistic label that weighs air quality more heavily. The approach achieves high generalization on held-out data (accuracy $0.99\%$) and yields actionable insights for targeted interventions and sustainable development policy, with clear pathways for real-time deployment, broader data fusion, and advanced modeling enhancements. These contributions advance data-driven decision-making for urban environmental health and illustrate practical steps toward proactive sustainability monitoring.
Abstract
Urbanization enables economic growth but also harms the environment through degradation. Traditional methods of detecting environmental issues have proven inefficient. Machine learning has emerged as a promising tool for tracking environmental deterioration by identifying key predictive features. Recent research focused on developing a predictive model using pollutant levels and particulate matter as indicators of environmental state in order to outline challenges. Machine learning was employed to identify patterns linking areas with worse conditions. This research aims to assist governments in identifying intervention points, improving planning and conservation efforts, and ultimately contributing to sustainable development.
