Intelligent Collaborative Optimization for Rubber Tyre Film Production Based on Multi-path Differentiated Clipping Proximal Policy Optimization
Yinghao Ruan, Wei Pang, Shuaihao Liu, Huili Yang, Leyi Han, Xinghui Dong
TL;DR
The paper addresses the challenge of dynamically coordinating high-dimensional, multi-objective processes in rubber tyre film production. It combines an LSTNet-based predictive model with the Multi-path Differentiated Clipping Proximal Policy Optimization (MPD-PPO) to enable autonomous, multi-branch policy updates across actuators, using a composite reward that balances accuracy, stability, and efficiency. Key contributions include the MPD-PPO algorithm with adaptive, per-branch clipping and pathway-specific advantages, a predictive-optimization closed-loop framework, and a high-fidelity simulation environment validated through real-world production data, demonstrating robust width and thickness control and real-time deployment viability. The approach promises improved production quality, reduced waste, and enhanced operational stability for smart tyre manufacturing, with demonstrated convergence within tens of optimization steps and successful in-situ validation on an MESNAC line. $R_t = R_e + R_p + P_a + R_s$ encapsulates the overall reward structure guiding multi-objective optimization, where components are tailored to tracking error, improvement momentum, action smoothness, and steady-state performance.$
Abstract
The advent of smart manufacturing is addressing the limitations of traditional centralized scheduling and inflexible production line configurations in the rubber tyre industry, especially in terms of coping with dynamic production demands. Contemporary tyre manufacturing systems form complex networks of tightly coupled subsystems pronounced nonlinear interactions and emergent dynamics. This complexity renders the effective coordination of multiple subsystems, posing an essential yet formidable task. For high-dimensional, multi-objective optimization problems in this domain, we introduce a deep reinforcement learning algorithm: Multi-path Differentiated Clipping Proximal Policy Optimization (MPD-PPO). This algorithm employs a multi-branch policy architecture with differentiated gradient clipping constraints to ensure stable and efficient high-dimensional policy updates. Validated through experiments on width and thickness control in rubber tyre film production, MPD-PPO demonstrates substantial improvements in both tuning accuracy and operational efficiency. The framework successfully tackles key challenges, including high dimensionality, multi-objective trade-offs, and dynamic adaptation, thus delivering enhanced performance and production stability for real-time industrial deployment in tyre manufacturing.
