Table of Contents
Fetching ...

Optimizing 2D+1 Packing in Constrained Environments Using Deep Reinforcement Learning

Victor Ulisses Pugliese, Oséias F. de A. Ferreira, Fabio A. Faria

TL;DR

This work tackles the constrained 2D+1 packing problem by developing a OpenAI Gymnasium-based simulator for two height-limited boards and a multi-discrete action space. It compares two on-policy actor-critic DRL methods, PPO and A2C, against a MaxRect-BL heuristic, highlighting PPO’s strong ability to learn packing strategies that maximize board coverage. The results show PPO achieving complete fillings across multiple even and odd board configurations, while A2C is slower and less stable; MaxRect-BL with height-based ordering can match PPO in some scenarios but struggles without ordering. The study demonstrates the practical potential of DRL for industrial packing tasks, particularly in aerospace composites, and outlines future enhancements to better mirror autoclave processes and sensor-driven control.

Abstract

This paper proposes a novel approach based on deep reinforcement learning (DRL) for the 2D+1 packing problem with spatial constraints. This problem is an extension of the traditional 2D packing problem, incorporating an additional constraint on the height dimension. Therefore, a simulator using the OpenAI Gym framework has been developed to efficiently simulate the packing of rectangular pieces onto two boards with height constraints. Furthermore, the simulator supports multidiscrete actions, enabling the selection of a position on either board and the type of piece to place. Finally, two DRL-based methods (Proximal Policy Optimization -- PPO and the Advantage Actor-Critic -- A2C) have been employed to learn a packing strategy and demonstrate its performance compared to a well-known heuristic baseline (MaxRect-BL). In the experiments carried out, the PPO-based approach proved to be a good solution for solving complex packaging problems and highlighted its potential to optimize resource utilization in various industrial applications, such as the manufacturing of aerospace composites.

Optimizing 2D+1 Packing in Constrained Environments Using Deep Reinforcement Learning

TL;DR

This work tackles the constrained 2D+1 packing problem by developing a OpenAI Gymnasium-based simulator for two height-limited boards and a multi-discrete action space. It compares two on-policy actor-critic DRL methods, PPO and A2C, against a MaxRect-BL heuristic, highlighting PPO’s strong ability to learn packing strategies that maximize board coverage. The results show PPO achieving complete fillings across multiple even and odd board configurations, while A2C is slower and less stable; MaxRect-BL with height-based ordering can match PPO in some scenarios but struggles without ordering. The study demonstrates the practical potential of DRL for industrial packing tasks, particularly in aerospace composites, and outlines future enhancements to better mirror autoclave processes and sensor-driven control.

Abstract

This paper proposes a novel approach based on deep reinforcement learning (DRL) for the 2D+1 packing problem with spatial constraints. This problem is an extension of the traditional 2D packing problem, incorporating an additional constraint on the height dimension. Therefore, a simulator using the OpenAI Gym framework has been developed to efficiently simulate the packing of rectangular pieces onto two boards with height constraints. Furthermore, the simulator supports multidiscrete actions, enabling the selection of a position on either board and the type of piece to place. Finally, two DRL-based methods (Proximal Policy Optimization -- PPO and the Advantage Actor-Critic -- A2C) have been employed to learn a packing strategy and demonstrate its performance compared to a well-known heuristic baseline (MaxRect-BL). In the experiments carried out, the PPO-based approach proved to be a good solution for solving complex packaging problems and highlighted its potential to optimize resource utilization in various industrial applications, such as the manufacturing of aerospace composites.

Paper Structure

This paper contains 19 sections, 13 figures, 4 tables.

Figures (13)

  • Figure 2: The types of piece available to be placed into the boards.
  • Figure 3: Evaluation curves of the three even experiments using PPO.
  • Figure 4: Mean episode length over even experiments using PPO.
  • Figure 5: Experiment 1 - All of pieces and boards were constrained to a uniform height for this experiment.
  • Figure 6: Experiment 2 - board 1 taller than board 2.
  • ...and 8 more figures