Table of Contents
Fetching ...

Industrial-scale Prediction of Cement Clinker Phases using Machine Learning

Sheikh Junaid Fayaz, Nestor Montiel-Bohorquez, Shashank Bishnoi, Matteo Romano, Manuele Gatti, N. M. Anoop Krishnan

TL;DR

The study tackles real-time clinker mineralogy prediction in cement production by leveraging a two-year industrial dataset to build a data-driven digital twin capable of predicting alite, belite, and ferrite from process data. It combines data collection, rigorous preprocessing, and a suite of ML models with SHAP-based explanations to ensure interpretability and trustworthiness. Nonlinear approaches (notably NN for alite, GPR for belite, and SVR for ferrite) achieve unprecedented accuracy and outperform the plant-specific Bogue equation, while plant-specific linear clinker equations offer a practical, enhanced alternative. The results demonstrate the viability of online quality control and potential process optimization, contributing to reduced material waste and emissions in industrial cement manufacturing.

Abstract

Cement production, exceeding 4.1 billion tonnes and contributing 2.4 tonnes of CO2 annually, faces critical challenges in quality control and process optimization. While traditional process models for cement manufacturing are confined to steady-state conditions with limited predictive capability for mineralogical phases, modern plants operate under dynamic conditions that demand real-time quality assessment. Here, exploiting a comprehensive two-year operational dataset from an industrial cement plant, we present a machine learning framework that accurately predicts clinker mineralogy from process data. Our model achieves unprecedented prediction accuracy for major clinker phases while requiring minimal input parameters, demonstrating robust performance under varying operating conditions. Through post-hoc explainable algorithms, we interpret the hierarchical relationships between clinker oxides and phase formation, providing insights into the functioning of an otherwise black-box model. This digital twin framework can potentially enable real-time optimization of cement production, thereby providing a route toward reducing material waste and ensuring quality while reducing the associated emissions under real plant conditions. Our approach represents a significant advancement in industrial process control, offering a scalable solution for sustainable cement manufacturing.

Industrial-scale Prediction of Cement Clinker Phases using Machine Learning

TL;DR

The study tackles real-time clinker mineralogy prediction in cement production by leveraging a two-year industrial dataset to build a data-driven digital twin capable of predicting alite, belite, and ferrite from process data. It combines data collection, rigorous preprocessing, and a suite of ML models with SHAP-based explanations to ensure interpretability and trustworthiness. Nonlinear approaches (notably NN for alite, GPR for belite, and SVR for ferrite) achieve unprecedented accuracy and outperform the plant-specific Bogue equation, while plant-specific linear clinker equations offer a practical, enhanced alternative. The results demonstrate the viability of online quality control and potential process optimization, contributing to reduced material waste and emissions in industrial cement manufacturing.

Abstract

Cement production, exceeding 4.1 billion tonnes and contributing 2.4 tonnes of CO2 annually, faces critical challenges in quality control and process optimization. While traditional process models for cement manufacturing are confined to steady-state conditions with limited predictive capability for mineralogical phases, modern plants operate under dynamic conditions that demand real-time quality assessment. Here, exploiting a comprehensive two-year operational dataset from an industrial cement plant, we present a machine learning framework that accurately predicts clinker mineralogy from process data. Our model achieves unprecedented prediction accuracy for major clinker phases while requiring minimal input parameters, demonstrating robust performance under varying operating conditions. Through post-hoc explainable algorithms, we interpret the hierarchical relationships between clinker oxides and phase formation, providing insights into the functioning of an otherwise black-box model. This digital twin framework can potentially enable real-time optimization of cement production, thereby providing a route toward reducing material waste and ensuring quality while reducing the associated emissions under real plant conditions. Our approach represents a significant advancement in industrial process control, offering a scalable solution for sustainable cement manufacturing.

Paper Structure

This paper contains 28 sections, 35 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Dataset characteristics and temporal variability in clinker phases. a, Schematic representation of a cement plant showing key measurement locations: kiln feed (KF), process parameters (PP), hot meal (HM), and clinker oxides (CO). The Venn diagram illustrates the combinations of input features used for model development. b, Two-year temporal evolution of alite content showing plant variability (black dots) with 0.01-99.99 percentile bounds (green shading). c,d, Frequency distribution of alite content in the complete dataset and zoomed view highlighting the normal distribution. e-g, Time series and distribution analysis of major clinker phases (alite, belite, ferrite) showing data partitioning into training (yellow) and test (green) sets. Statistical parameters ($\mu$, $\sigma$) characterize the phase distributions. Each subplot includes temporal evolution (left) and frequency distribution (right) with mean (m) and standard deviation values. The full dataset is shown in red, the training set in yellow, and the test set in green.
  • Figure 2: Performance comparison of machine learning architectures for clinker phase prediction. a-c, Mean Absolute Percentage Error (MAPE) across nine ML models for predicting alite, belite, and ferrite compositions using complete feature sets (KF, PP, HM, CO), respectively. Values in parentheses indicate test set MAPE. The best-performing models are shown in bold. Quantitative performance metrics ($R^2$ and MAPE) for the best-performing model against traditional Bogue calculations represented as parity plot and temporal evolution, respectively, for d,e, alite, f,g, belite, and h,i, ferrite with inset histograms showing error distributions ($\epsilon = \text{predicted - actual}$) for ML models (top) and Bogue calculations (bottom). Red-shaded regions in histograms represent 95% confidence intervals ($\pm 2\sigma$), with x-axis limits set at 99.9% confidence ($\pm 4\sigma$). The temporal evolution of predictions is over a two-month test period showing plant data (red), ML predictions (black dashed), and Bogue calculations (green dotted). Grey bands represent model uncertainty ($\pm 3\sigma$), while red bars (right axis) indicate absolute prediction errors. All error metrics are reported in weight percentage (wt.%).
  • Figure 3: Comparison with Bogue equation. a, MAPE of optimal machine learning models (Neural Network for alite, Gaussian Process Regression for belite, Support Vector Regression for ferrite) across 15 combinations of input features: process parameters (PP), kiln feed (KF), hot meal (HM), and clinker oxides (CO). Values in parentheses represent MAPE (%) for alite (green), ferrite (blue), and belite (red) predictions. b-g, Performance evaluation of plant-specific clinker equations against standard Bogue calculations. Parity and temporal plots comparing predicted versus measured compositions for b,c, alite, d,e, belite, and f,g, ferrite, respectively. Inset histograms show error distributions for clinker equations (top) and Bogue calculations (bottom). The temporal evolution of predictions is over a two-month test period showing plant data (red), clinker equation predictions (blue dashed), and Bogue calculations (green dotted). Grey bands represent model uncertainty (±3$\sigma$), while red bars (right axis) indicate absolute prediction errors. Training ($R^2_{train}$) and test ($R^2_{test}$) set performance metrics demonstrate superior accuracy of plant-specific equations over traditional Bogue calculations. All compositions and errors are reported in weight percentage (wt.%).
  • Figure 4: Feature attribution analysis of clinker phase predictions using SHAP. a-c, Hierarchical ranking of clinker oxide contributions to phase predictions for alite, belite, and ferrite, respectively. Bar lengths indicate mean absolute SHAP values (wt.%), representing averaged feature impact across the test dataset. d-f, Corresponding beeswarm plots revealing the directional influence of each oxide on phase formation. SHAP values (x-axis) indicate deviation from mean phase composition, with positive values suggesting increased formation. Color gradient (blue to red) represents oxide concentration from minimum to maximum, with point density indicating frequency of occurrence. Numbers in parentheses show oxide composition ranges (wt.%). CaO demonstrates a dominant positive correlation with alite formation, while SiO$_2$ shows a strong negative influence, aligning with established clinker chemistry.
  • Figure :
  • ...and 9 more figures