Table of Contents
Fetching ...

espiownage: Tracking Transients in Steelpan Drum Strikes Using Surveillance Technology

Scott H. Hawley, Andrew C. Morrison, Grant S. Morgan

TL;DR

A segmentation-regression map for the entire drum surface yielding interference fringe counts comparable to those obtained via object detection and image segmentation, made possible by incorporating robust computer vision libraries for object detection and image segmentation.

Abstract

We present an improvement in the ability to meaningfully track features in high speed videos of Caribbean steelpan drums illuminated by Electronic Speckle Pattern Interferometry (ESPI). This is achieved through the use of up-to-date computer vision libraries for object detection and image segmentation as well as a significant effort toward cleaning the dataset previously used to train systems for this application. Besides improvements on previous metric scores by 10% or more, noteworthy in this project are the introduction of a segmentation-regression map for the entire drum surface yielding interference fringe counts comparable to those obtained via object detection, as well as the accelerated workflow for coordinating the data-cleaning-and-model-training feedback loop for rapid iteration allowing this project to be conducted on a timescale of only 18 days.

espiownage: Tracking Transients in Steelpan Drum Strikes Using Surveillance Technology

TL;DR

A segmentation-regression map for the entire drum surface yielding interference fringe counts comparable to those obtained via object detection and image segmentation, made possible by incorporating robust computer vision libraries for object detection and image segmentation.

Abstract

We present an improvement in the ability to meaningfully track features in high speed videos of Caribbean steelpan drums illuminated by Electronic Speckle Pattern Interferometry (ESPI). This is achieved through the use of up-to-date computer vision libraries for object detection and image segmentation as well as a significant effort toward cleaning the dataset previously used to train systems for this application. Besides improvements on previous metric scores by 10% or more, noteworthy in this project are the introduction of a segmentation-regression map for the entire drum surface yielding interference fringe counts comparable to those obtained via object detection, as well as the accelerated workflow for coordinating the data-cleaning-and-model-training feedback loop for rapid iteration allowing this project to be conducted on a timescale of only 18 days.

Paper Structure

This paper contains 8 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Left: Screenshot of our enhanced "ellipse editor" tool, which builds on code released in prior work hm2021. Besides the prior ability to graphically edit annotations of boundaries and ring counts elliptical antinode regions, this newer version of the software displays predictions from the neural network models' predictions of bounding boxes, ring counts, and segmentation regression maps. Middle: Bounding box detection of antinode regions via IceVision icevision2020 using their tuned RetinaNet retinanet model. We also tried detecting individual rings as objects but there were too many false negatives, whereas the model was almost always able to detect entire antinodes, including those that annotators missed. The cropped regions became inputs to the ring-counting code, a binary classifier adapted for regression by extending the vertical range of the final sigmoid to the range of our outputs ( i.e., we stay within the linear regime of the sigmoid). Note that the antinodes basically circular, becoming more so when cropped and re-shaped as square images, which then allows for arbitrary rotations in addition to other standard image data augmentation methods.
  • Figure 2: Ring counts obtained from cropped images. Top: predicted and target values, arranged in order of worst agreement to best, for the different datasets. The prevelance of values at the max (11) and min ($\sim$1) are reflections of the data-annotation policy of the SVP SVP. Note how the CycleGAN dataset from hm2021, despite its visual similarity to the real images, only contained integer ring values. Bottom: Plots of predicted rings vs. target rings, showing that our data-cleaning effort ("real," left column) resulted in closer agreement and less compression of the dynamic range that other datasets.
  • Figure 3: Segmentation examples. Left: All antinodes as one class, used for inclusion in the ellipse editor for the data-cleaning workflow. Right: Segmentation-regression maps for target and predicted ring counts. In each case there is one output "class" but for regression this is a floating point value and the scale of the final sigmoid activation is scaled to keep the output within the linear regime.
  • Figure 4: Left: Close agreement between the ring counts (as a function of time) from the bounding-box-and-crop method and the segmentation regression map. Middle: Our version of Figure 6 from hm2021 yet with lower uncertainties and higher correlation coefficient. Right: Our replication of a key "preliminary physics" result in a panel from Figure 7 of hm2021, confirming the dissimilarity in rise times between the amplitude heard in audio recordings (Dot-dashed/black and dashed/red lines) and the amplitude observed in ESPI video.
  • Figure 5: Segmentation-regression maps for other instruments, highlighting the generalization performance of our model. These maps were obtained via inference using the model trained only on our steelpan dataset. Although the model was only trained on elliptical antinode regions, we see that it is able to trace the contours and count rings for the roughly triangular antinode regions for the drum image on the left moore_ajp and the bean-shaped antinodes of the 17th-century lyra on the right Bakarezos2019.