Table of Contents
Fetching ...

Process-Informed Forecasting of Complex Thermal Dynamics in Pharmaceutical Manufacturing

Ramona Rubini, Siavash Khodakarami, Aniruddha Bora, George Em Karniadakis, Michele Dassisti

Abstract

Accurate time-series forecasting for complex physical systems is the backbone of modern industrial monitoring and control, yet deep learning models often lack the physical consistency required in regulated environments. To bridge this gap, we introduce Process-Informed Forecasting (PIF) models for temperature in pharmaceutical lyophilization, embedding deterministic production recipes as macro-structural priors. We investigate classical methods (e.g., Autoregressive Integrated Moving Average (ARIMA) model) and modern deep learning architectures, including Kolmogorov-Arnold Networks (KANs). We compare three different loss function formulations that integrate a process-informed trajectory prior: a fixed-weight loss, a dynamic uncertainty-based loss, and a Residual-Based Attention (RBA) mechanism. We evaluate all models not only for accuracy and physical consistency but also for robustness to sensor noise. Furthermore, we test the practical generalizability of the best model in a transfer learning scenario on a new process. Our results show that PIF models outperform their data-driven counterparts in terms of accuracy, physical plausibility and noise resilience, offering a scalable framework for reliable and generalizable forecasting solutions in critical manufacturing.

Process-Informed Forecasting of Complex Thermal Dynamics in Pharmaceutical Manufacturing

Abstract

Accurate time-series forecasting for complex physical systems is the backbone of modern industrial monitoring and control, yet deep learning models often lack the physical consistency required in regulated environments. To bridge this gap, we introduce Process-Informed Forecasting (PIF) models for temperature in pharmaceutical lyophilization, embedding deterministic production recipes as macro-structural priors. We investigate classical methods (e.g., Autoregressive Integrated Moving Average (ARIMA) model) and modern deep learning architectures, including Kolmogorov-Arnold Networks (KANs). We compare three different loss function formulations that integrate a process-informed trajectory prior: a fixed-weight loss, a dynamic uncertainty-based loss, and a Residual-Based Attention (RBA) mechanism. We evaluate all models not only for accuracy and physical consistency but also for robustness to sensor noise. Furthermore, we test the practical generalizability of the best model in a transfer learning scenario on a new process. Our results show that PIF models outperform their data-driven counterparts in terms of accuracy, physical plausibility and noise resilience, offering a scalable framework for reliable and generalizable forecasting solutions in critical manufacturing.

Paper Structure

This paper contains 10 sections, 14 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Comparison between the real thermal dynamics (ground truth sensor data) and the idealized piecewise linear prior derived from the PLC production recipe. Dataset 1 exhibits a pronounced thermal inertia during the primary ramp-up, which the PIF model is tasked to learn as a residual. Dataset 2 demonstrates a different thermal profile with a distinct ramp-up slope and extended secondary drying phase. The close alignment between the recipe prior and the sensor data in both cases confirms that the piecewise linear formulation provides a robust macro-structural guide, regardless of the specific product-dependent thermal dynamics.
  • Figure 2: Overview of the proposed Process-Informed Forecasting (PIF) methodology. (Upper part) Real sensor data $(y_\text{data})$ are used to train classical and deep learning models. A Process-Informed (PI) prior $(y_\text{PI})$, derived from the manufacturing recipe, is then incorporated to create PIF models. Classical Models undergo post-hoc integration. While deep learning models are guided by three distinct loss function formulations: (a) Fixed-Weighted, (b) Uncertainty-Based, and (c) Residual-Based Attention (RBA). (Lower Part) The performance of all models, classical, data-driven and PIF models, is assessed through standard metrics of accuracy $(RMSE$, $L_\infty(RMSE))$ and physical plausibility ($Gradient Error$, $L_\infty(GradErr)$, Physical Violation Rate (PVR %) and Maximum Overshoot (MO)). Then a robustness analysis under two distinct noise injection scenarios, input-only and system-wide, is performed. Finally, a transfer learning task evaluates the generalizability of the best-performing model by applying it to a new, unseen dataset.
  • Figure 3: Comparative analysis of model predictions for the thermal dynamics of the lyophilization process. The figure shows the output of the best-performing variant from seven different model families ($\sim{30,000}$ parameters). These zoomed-in views highlight the effectiveness of architectures like cKAN, KAN and LEM in modeling non-linear, transient behavior. In contrast, the Transformer architecture struggles to capture the peak temperature, underscoring its limitations for this specific dynamic system.
  • Figure 4: Robustness evaluation of models of approximately 30,000 parameters. This chart evaluates how the predictive accuracy of different models deteriorates as the input data becomes noisier. The results identify the cKAN model as the most resilient maintaining the lowest error across the noise spectrum. The standard KAN and MLP demonstrate strong ability to handle noise. Instead, architecture like LEM_fixed and Transformer_RBA are more sensitive to data perturbations, showing a rapid decline in accuracy as noise level increase.
  • Figure 5: A detailed comparison of model robustness to input noise, with performance degradation (RMSE) analyzed separately for each model family. (a) classical time-series models (b) cKAN architectures; (c) KAN models; (d) LEM architectures; (e) LSTM models; (f) MLP architectures; (g) RNN models; (h) Transformer models. The noise level represents the standard deviation of Gaussian noise added to the input sensor features, while the target output remains unperturbed.
  • ...and 6 more figures