Table of Contents
Fetching ...

Enhancing Retail Sales Forecasting with Optimized Machine Learning Models

Priyam Ganguly, Isha Mukherjee

TL;DR

The paper tackles retail store sales forecasting in the presence of strong seasonality and many product families, where LR underfits traditional models. It evaluates several ML approaches (RF, GB, SVR, XGBoost) and shows that an optimized RF with hyperparameter tuning and feature engineering achieves $R^2=0.945$ and $RMSLE=1.172$, outperforming GB ($R^2=0.942$), SVR ($R^2=0.940$), and XGBoost ($R^2=0.939$). The study uses the Favorita Stores dataset from Ecuador and demonstrates that careful preprocessing, lag features, and cross-validated hyperparameter search are key to capturing complex patterns. It highlights the practical impact of advanced ML in predictive analytics for retail, while suggesting future work on external variables and interpretable, hybrid models.

Abstract

In retail sales forecasting, accurately predicting future sales is crucial for inventory management and strategic planning. Traditional methods like LR often fall short due to the complexity of sales data, which includes seasonality and numerous product families. Recent advancements in machine learning (ML) provide more robust alternatives. This research benefits from the power of ML, particularly Random Forest (RF), Gradient Boosting (GB), Support Vector Regression (SVR), and XGBoost, to improve prediction accuracy. Despite advancements, a significant gap exists in handling complex datasets with high seasonality and multiple product families. The proposed solution involves implementing and optimizing a RF model, leveraging hyperparameter tuning through randomized search cross-validation. This approach addresses the complexities of the dataset, capturing intricate patterns that traditional methods miss. The optimized RF model achieved an R-squared value of 0.945, substantially higher than the initial RF model and traditional LR, which had an R-squared of 0.531. The model reduced the root mean squared logarithmic error (RMSLE) to 1.172, demonstrating its superior predictive capability. The optimized RF model did better than cutting-edge models like Gradient Boosting (R-squared: 0.942), SVR (R-squared: 0.940), and XGBoost (R-squared: 0.939), with more minor mean squared error (MSE) and mean absolute error (MAE) numbers. The results demonstrate that the optimized RF model excels in forecasting retail sales, handling the datasets complexity with higher accuracy and reliability. This research highlights the importance of advanced ML techniques in predictive analytics, offering a significant improvement over traditional methods and other contemporary models.

Enhancing Retail Sales Forecasting with Optimized Machine Learning Models

TL;DR

The paper tackles retail store sales forecasting in the presence of strong seasonality and many product families, where LR underfits traditional models. It evaluates several ML approaches (RF, GB, SVR, XGBoost) and shows that an optimized RF with hyperparameter tuning and feature engineering achieves and , outperforming GB (), SVR (), and XGBoost (). The study uses the Favorita Stores dataset from Ecuador and demonstrates that careful preprocessing, lag features, and cross-validated hyperparameter search are key to capturing complex patterns. It highlights the practical impact of advanced ML in predictive analytics for retail, while suggesting future work on external variables and interpretable, hybrid models.

Abstract

In retail sales forecasting, accurately predicting future sales is crucial for inventory management and strategic planning. Traditional methods like LR often fall short due to the complexity of sales data, which includes seasonality and numerous product families. Recent advancements in machine learning (ML) provide more robust alternatives. This research benefits from the power of ML, particularly Random Forest (RF), Gradient Boosting (GB), Support Vector Regression (SVR), and XGBoost, to improve prediction accuracy. Despite advancements, a significant gap exists in handling complex datasets with high seasonality and multiple product families. The proposed solution involves implementing and optimizing a RF model, leveraging hyperparameter tuning through randomized search cross-validation. This approach addresses the complexities of the dataset, capturing intricate patterns that traditional methods miss. The optimized RF model achieved an R-squared value of 0.945, substantially higher than the initial RF model and traditional LR, which had an R-squared of 0.531. The model reduced the root mean squared logarithmic error (RMSLE) to 1.172, demonstrating its superior predictive capability. The optimized RF model did better than cutting-edge models like Gradient Boosting (R-squared: 0.942), SVR (R-squared: 0.940), and XGBoost (R-squared: 0.939), with more minor mean squared error (MSE) and mean absolute error (MAE) numbers. The results demonstrate that the optimized RF model excels in forecasting retail sales, handling the datasets complexity with higher accuracy and reliability. This research highlights the importance of advanced ML techniques in predictive analytics, offering a significant improvement over traditional methods and other contemporary models.

Paper Structure

This paper contains 8 sections, 10 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Residuals versus time
  • Figure 2: Residuals versus predicted values