Table of Contents
Fetching ...

ERL-MPP: Evolutionary Reinforcement Learning with Multi-head Puzzle Perception for Solving Large-scale Jigsaw Puzzles of Eroded Gaps

Xingke Song, Xiaoying Yang, Chenglin Yao, Jianfeng Ren, Ruibin Bai, Xin Chen, Xudong Jiang

TL;DR

This work addresses solving large-scale jigsaw puzzles with eroded gaps by introducing Evolutionary Reinforcement Learning with Multi-head Puzzle Perception (ERL-MPP). A Multi-head Puzzle Perception Network (MPPN) provides global and local puzzle status via a shared encoder, discriminative global assessment, and puzzle-unit heads, while EvoRL uses an actor-critic-evaluator framework to efficiently explore a large action space including Swap-2, Swap-3, and Swap-Puzzlet actions. The approach reports significant improvements over state-of-the-art on JPLEG-5 and MIT datasets, demonstrating strong perception under gaps and effective large-scale action-space optimization. The results suggest practical impact for artifact reconstruction and other settings with eroded information and combinatorial assembly tasks.

Abstract

Solving jigsaw puzzles has been extensively studied. While most existing models focus on solving either small-scale puzzles or puzzles with no gap between fragments, solving large-scale puzzles with gaps presents distinctive challenges in both image understanding and combinatorial optimization. To tackle these challenges, we propose a framework of Evolutionary Reinforcement Learning with Multi-head Puzzle Perception (ERL-MPP) to derive a better set of swapping actions for solving the puzzles. Specifically, to tackle the challenges of perceiving the puzzle with gaps, a Multi-head Puzzle Perception Network (MPPN) with a shared encoder is designed, where multiple puzzlet heads comprehensively perceive the local assembly status, and a discriminator head provides a global assessment of the puzzle. To explore the large swapping action space efficiently, an Evolutionary Reinforcement Learning (EvoRL) agent is designed, where an actor recommends a set of suitable swapping actions from a large action space based on the perceived puzzle status, a critic updates the actor using the estimated rewards and the puzzle status, and an evaluator coupled with evolutionary strategies evolves the actions aligning with the historical assembly experience. The proposed ERL-MPP is comprehensively evaluated on the JPLEG-5 dataset with large gaps and the MIT dataset with large-scale puzzles. It significantly outperforms all state-of-the-art models on both datasets.

ERL-MPP: Evolutionary Reinforcement Learning with Multi-head Puzzle Perception for Solving Large-scale Jigsaw Puzzles of Eroded Gaps

TL;DR

This work addresses solving large-scale jigsaw puzzles with eroded gaps by introducing Evolutionary Reinforcement Learning with Multi-head Puzzle Perception (ERL-MPP). A Multi-head Puzzle Perception Network (MPPN) provides global and local puzzle status via a shared encoder, discriminative global assessment, and puzzle-unit heads, while EvoRL uses an actor-critic-evaluator framework to efficiently explore a large action space including Swap-2, Swap-3, and Swap-Puzzlet actions. The approach reports significant improvements over state-of-the-art on JPLEG-5 and MIT datasets, demonstrating strong perception under gaps and effective large-scale action-space optimization. The results suggest practical impact for artifact reconstruction and other settings with eroded information and combinatorial assembly tasks.

Abstract

Solving jigsaw puzzles has been extensively studied. While most existing models focus on solving either small-scale puzzles or puzzles with no gap between fragments, solving large-scale puzzles with gaps presents distinctive challenges in both image understanding and combinatorial optimization. To tackle these challenges, we propose a framework of Evolutionary Reinforcement Learning with Multi-head Puzzle Perception (ERL-MPP) to derive a better set of swapping actions for solving the puzzles. Specifically, to tackle the challenges of perceiving the puzzle with gaps, a Multi-head Puzzle Perception Network (MPPN) with a shared encoder is designed, where multiple puzzlet heads comprehensively perceive the local assembly status, and a discriminator head provides a global assessment of the puzzle. To explore the large swapping action space efficiently, an Evolutionary Reinforcement Learning (EvoRL) agent is designed, where an actor recommends a set of suitable swapping actions from a large action space based on the perceived puzzle status, a critic updates the actor using the estimated rewards and the puzzle status, and an evaluator coupled with evolutionary strategies evolves the actions aligning with the historical assembly experience. The proposed ERL-MPP is comprehensively evaluated on the JPLEG-5 dataset with large gaps and the MIT dataset with large-scale puzzles. It significantly outperforms all state-of-the-art models on both datasets.

Paper Structure

This paper contains 13 sections, 8 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Huge gaps and a large amount of fragments impose great challenges on puzzle solvers. The proposed ERL-MPP significantly outperforms the previous best performing SD$^2$RL song2023sd2rl on four different types of puzzles, e.g., puzzles with as large as 12-pixel gaps, or puzzles with as many as 150 pieces of fragments.
  • Figure 2: Block diagram of the proposed ERL-MPP for solving L-JPEG problems. The shared MPPN globally perceives the puzzle through a discriminator head and locally perceives it through three puzzlet perception heads. The EvoRL agent determines an optimal sequence of swapping actions till perfectly reassembling the puzzle, where an actor recommends swapping actions from a large action space of Swap-2, Swap-3, and Swap-Puzzlet actions, a critic estimates the state value and updates the actor based on the visual perception from the MPPN and the estimated reward, and an evaluator assesses and selects the most suitable action after evolutionary operations such as crossover and mutation. A set of rewards considering fragment placement, pairwise adjacency and perfect reassembly are designed to guide the training of the agent.
  • Figure 3: The proposed MPPN with a shared encoder, where a discriminator head perceives global puzzle semantics, and puzzlet perception heads perceive local adjacency relations.
  • Figure 4: Visualization of the reassembling results on sample puzzles from the JPLEG-5 dataset song2023sd2rl.
  • Figure 5: Visualization of the reassembled puzzles of $7\times 10$ pieces with 2-pixel gaps and 4-pixel gaps in the MIT dataset.