MerGen: Micro-electrode recording synthesis using a generative data-driven approach
Thibault Martin, Paul Sauleau, Claire Haegelen, Pierre Jannin, John S. H. Baxter
TL;DR
MerGen addresses the need for realistic MER training data to support DBS electrode implantation, by generating de novo MER signals using a three-component architecture: MelGAN-based waveform reconstruction from Mel-spectrograms, a multi-level VQVAE that encodes spectrograms into discrete tokens, and cascaded transformers that produce context-conditioned token sequences. The authors demonstrate perceptual realism through human perception studies and validate utility via data augmentation for MER classification, showing improved performance when synthetic data complements real data. Key contributions include a conditioning framework combining global and local features, a scalable token-based generator, and a dual evaluation of realism and augmentation impact, all within a real-time capable pipeline. This work has practical implications for clinician training and decision-support in the operating room, while highlighting limitations such as single-centre data and the need for broader integration into multi-modal intraoperative simulators.
Abstract
The analysis of electrophysiological data is crucial for certain surgical procedures such as deep brain stimulation, which has been adopted for the treatment of a variety of neurological disorders. During the procedure, auditory analysis of these signals helps the clinical team to infer the neuroanatomical location of the stimulation electrode and thus optimize clinical outcomes. This task is complex, and requires an expert who in turn requires significant training. In this paper, we propose a generative neural network, called MerGen, capable of simulating de novo electrophysiological recordings, with a view to providing a realistic learning tool for clinicians trainees for identifying these signals. We demonstrate that the generated signals are perceptually indistinguishable from real signals by experts in the field, and that it is even possible to condition the generation efficiently to provide a didactic simulator adapted to a particular surgical scenario. The efficacy of this conditioning is demonstrated, comparing it to intra-observer and inter-observer variability amongst experts. We also demonstrate the use of this network for data augmentation for automatic signal classification which can play a role in decision-making support in the operating theatre.
