Table of Contents
Fetching ...

Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN

Elona Shatri, Kalikidhar Palavala, George Fazekas

TL;DR

The proposed CycleWGAN model, which enhances style transfer and training stability, significantly outperforms DCGAN and ProGAN in both qualitative and quantitative evaluations, making it a promising solution for improving OMR systems.

Abstract

The generation of handwritten music sheets is a crucial step toward enhancing Optical Music Recognition (OMR) systems, which rely on large and diverse datasets for optimal performance. However, handwritten music sheets, often found in archives, present challenges for digitisation due to their fragility, varied handwriting styles, and image quality. This paper addresses the data scarcity problem by applying Generative Adversarial Networks (GANs) to synthesise realistic handwritten music sheets. We provide a comprehensive evaluation of three GAN models - DCGAN, ProGAN, and CycleWGAN - comparing their ability to generate diverse and high-quality handwritten music images. The proposed CycleWGAN model, which enhances style transfer and training stability, significantly outperforms DCGAN and ProGAN in both qualitative and quantitative evaluations. CycleWGAN achieves superior performance, with an FID score of 41.87, an IS of 2.29, and a KID of 0.05, making it a promising solution for improving OMR systems.

Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN

TL;DR

The proposed CycleWGAN model, which enhances style transfer and training stability, significantly outperforms DCGAN and ProGAN in both qualitative and quantitative evaluations, making it a promising solution for improving OMR systems.

Abstract

The generation of handwritten music sheets is a crucial step toward enhancing Optical Music Recognition (OMR) systems, which rely on large and diverse datasets for optimal performance. However, handwritten music sheets, often found in archives, present challenges for digitisation due to their fragility, varied handwriting styles, and image quality. This paper addresses the data scarcity problem by applying Generative Adversarial Networks (GANs) to synthesise realistic handwritten music sheets. We provide a comprehensive evaluation of three GAN models - DCGAN, ProGAN, and CycleWGAN - comparing their ability to generate diverse and high-quality handwritten music images. The proposed CycleWGAN model, which enhances style transfer and training stability, significantly outperforms DCGAN and ProGAN in both qualitative and quantitative evaluations. CycleWGAN achieves superior performance, with an FID score of 41.87, an IS of 2.29, and a KID of 0.05, making it a promising solution for improving OMR systems.

Paper Structure

This paper contains 18 sections, 7 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: (a) Handwritten image crop from the CVC-MUSCIMA dataset obtained after the data augmentation process. (b) Printed image crop from the DOREMI dataset obtained after the data augmentation process.
  • Figure 2: CycleWGAN pipeline architecture used for handwritten and printed scores.
  • Figure 3: Training loss of DCGAN over 100 epochs.
  • Figure 4: Training loss of ProGAN over 180 epochs demonstrating stable learning progression for lower-resolution images. However, instability appears at higher resolutions, indicating the need for further hyperparameter tuning to enhance stability at advanced stages.
  • Figure 5: Training loss of (Gen H, Gen P) and (Disc H, Disc P) components of CycleWGAN over 25 epochs.
  • ...and 4 more figures