Curriculum-Based Soft Actor-Critic for Multi-Section R2R Tension Control

Shihao Li; Jiachen Li; Christopher Martin; Zijun Chen; Dongmei Chen; Wei Li

Curriculum-Based Soft Actor-Critic for Multi-Section R2R Tension Control

Shihao Li, Jiachen Li, Christopher Martin, Zijun Chen, Dongmei Chen, Wei Li

Abstract

Precise tension control in roll-to-roll (R2R) manufacturing is difficult under varying operating conditions and process uncertainty. This paper presents a curriculum-based Soft Actor-Critic (SAC) controller for multi-section R2R tension control. The policy is trained in three phases with progressively wider reference ranges, from 27 to 33 N to the full operating envelope of 20 to 40 N, so it can generalize across nominal and disturbed conditions. On a three-section R2R benchmark, the learned controller achieves accurate tracking in nominal operation and handles large disturbances, including 20 N to 40 N step changes, with a single policy and no scenario-specific retuning. These results indicate that curriculum-trained SAC is a practical alternative to model-based control when system parameters vary and process uncertainty is significant.

Curriculum-Based Soft Actor-Critic for Multi-Section R2R Tension Control

Abstract

Paper Structure (22 sections, 7 equations, 6 figures, 5 tables)

This paper contains 22 sections, 7 equations, 6 figures, 5 tables.

Introduction
Methodology
Roll-to-Roll System Dynamics
Soft Actor-Critic
Formulation of Markov Decision Process
State Space
Action Space
Reward Function
Neural Network Architecture
Training Procedure
Curriculum Learning Strategy
Standard SAC Training Loop
Case Study: Three-Section R2R System
Results and Discussion
Training Performance
...and 7 more sections

Figures (6)

Figure 1: Schematic of a simplified R2R line
Figure 2: SAC training progress over 500,000 timesteps (approximately 1,000 episodes per seed, where each episode = 500 timesteps of simulated R2R control) with three-phase curriculum learning (n=3 random seeds for statistical rigor, representing 3,000 total episodes). The solid blue line shows mean evaluation reward across all seeds, with shaded region indicating confidence interval (mean $\pm$ 1 std). Faint blue lines show individual seed trajectories. Vertical dotted lines mark curriculum phase transitions at 200k (Phase 1 $\rightarrow$ Phase 2) and 400k steps (Phase 2 $\rightarrow$ Phase 3). Colored boxes indicate tension reference ranges: Phase 1 (27N--33N), Phase 2 (25N--35N), Phase 3 (20N--40N). The orange star marks the best mean performance at 440k steps (reward=4.87). Horizontal dashed lines show MPC (4.69) and LQR (4.71) baseline performance for comparison. The narrow confidence intervals demonstrate reproducible learning across random seeds.
Figure 3: Tracking performance comparison across all three sections over a 5-second episode: (a) Tension tracking trajectories for SAC (blue), MPC (purple), and LQR (orange) around the 30N reference (black dashed line); (b) Velocity tracking trajectories showing all controllers maintain stable velocity profiles around the 0.0101ms reference.
Figure 4: Evolution of tracking errors over time for tension (top row) and velocity (bottom row) across all three sections. SAC (blue), MPC (purple), and LQR (orange) trajectories shown with process noise present.
Figure 5: Control action trajectories for all three roller motors. SAC (blue), MPC (purple), and LQR (orange) show different control behaviors, with SAC exhibiting more active control with higher variance.
...and 1 more figures

Curriculum-Based Soft Actor-Critic for Multi-Section R2R Tension Control

Abstract

Curriculum-Based Soft Actor-Critic for Multi-Section R2R Tension Control

Authors

Abstract

Table of Contents

Figures (6)