Table of Contents
Fetching ...

RAINER: A Robust Ensemble Learning Grid Search-Tuned Framework for Rainfall Patterns Prediction

Zhenqi Li, Junhao Zhong, Hewei Wang, Jinfeng Xu, Yijie Li, Jinjiang You, Jiayi Zhang, Runzhi Wu, Soumyabrata Dev

TL;DR

Rainfall prediction remains challenging due to nonlinear meteorological dynamics. The authors propose RAINER, a grid-search tuned ensemble framework that combines a robust preprocessing pipeline, domain-driven feature construction (e.g., temperature differences), and PCA-based dimensionality reduction, evaluated across diverse models from weak classifiers to KAN-based architectures. Key contributions include a systematic data analysis pipeline, extensive grid-search across multiple feature sets, and ensemble voting that yields strong performance on the BoM Australian dataset, with metrics such as Accuracy and AUC used for evaluation. The work demonstrates that carefully engineered features and ensemble methods can outperform more complex models on structured meteorological data, offering a practical, reusable workflow for real-world rainfall forecasting and beyond.

Abstract

Rainfall prediction remains a persistent challenge due to the highly nonlinear and complex nature of meteorological data. Existing approaches lack systematic utilization of grid search for optimal hyperparameter tuning, relying instead on heuristic or manual selection, frequently resulting in sub-optimal results. Additionally, these methods rarely incorporate newly constructed meteorological features such as differences between temperature and humidity to capture critical weather dynamics. Furthermore, there is a lack of systematic evaluation of ensemble learning techniques and limited exploration of diverse advanced models introduced in the past one or two years. To address these limitations, we propose a robust ensemble learning grid search-tuned framework (RAINER) for rainfall prediction. RAINER incorporates a comprehensive feature engineering pipeline, including outlier removal, imputation of missing values, feature reconstruction, and dimensionality reduction via Principal Component Analysis (PCA). The framework integrates novel meteorological features to capture dynamic weather patterns and systematically evaluates non-learning mathematical-based methods and a variety of machine learning models, from weak classifiers to advanced neural networks such as Kolmogorov-Arnold Networks (KAN). By leveraging grid search for hyperparameter tuning and ensemble voting techniques, RAINER achieves promising results within real-world datasets.

RAINER: A Robust Ensemble Learning Grid Search-Tuned Framework for Rainfall Patterns Prediction

TL;DR

Rainfall prediction remains challenging due to nonlinear meteorological dynamics. The authors propose RAINER, a grid-search tuned ensemble framework that combines a robust preprocessing pipeline, domain-driven feature construction (e.g., temperature differences), and PCA-based dimensionality reduction, evaluated across diverse models from weak classifiers to KAN-based architectures. Key contributions include a systematic data analysis pipeline, extensive grid-search across multiple feature sets, and ensemble voting that yields strong performance on the BoM Australian dataset, with metrics such as Accuracy and AUC used for evaluation. The work demonstrates that carefully engineered features and ensemble methods can outperform more complex models on structured meteorological data, offering a practical, reusable workflow for real-world rainfall forecasting and beyond.

Abstract

Rainfall prediction remains a persistent challenge due to the highly nonlinear and complex nature of meteorological data. Existing approaches lack systematic utilization of grid search for optimal hyperparameter tuning, relying instead on heuristic or manual selection, frequently resulting in sub-optimal results. Additionally, these methods rarely incorporate newly constructed meteorological features such as differences between temperature and humidity to capture critical weather dynamics. Furthermore, there is a lack of systematic evaluation of ensemble learning techniques and limited exploration of diverse advanced models introduced in the past one or two years. To address these limitations, we propose a robust ensemble learning grid search-tuned framework (RAINER) for rainfall prediction. RAINER incorporates a comprehensive feature engineering pipeline, including outlier removal, imputation of missing values, feature reconstruction, and dimensionality reduction via Principal Component Analysis (PCA). The framework integrates novel meteorological features to capture dynamic weather patterns and systematically evaluates non-learning mathematical-based methods and a variety of machine learning models, from weak classifiers to advanced neural networks such as Kolmogorov-Arnold Networks (KAN). By leveraging grid search for hyperparameter tuning and ensemble voting techniques, RAINER achieves promising results within real-world datasets.

Paper Structure

This paper contains 20 sections, 9 equations, 19 figures, 3 tables.

Figures (19)

  • Figure 1: Bar Chart of missing values for meteorological features. Features such as "Evaporation" and "Cloud9am" show the highest missingness, while others like "Location" and "MinTemp" have near-complete data. High-missingness features were excluded to ensure unbiased analysis.
  • Figure 2: Weather attribute dendrogram. It highlights relationships such as the strong clustering of "Pressure9am" and "Pressure3pm.
  • Figure 3: Weather feature correlation matrix. The correlation heatmap illustrates the dependencies between features, revealing strong correlations such as those between "Humidity9am" and "Humidity3pm".
  • Figure 4: Histograms comparing feature distributions before and after imputation. The x-axis represents various meteorological attributes and the y-axis indicates the count of observations for each attribute. These histograms demonstrate that the imputation methods preserved the central tendencies and overall distributions of the data.
  • Figure 5: Missing value matrix. The missing value matrix displays the distribution of missing values across different features.
  • ...and 14 more figures