EnvId: A Metric Learning Approach for Forensic Few-Shot Identification of Unseen Environments
Denise Moussa, Germans Hirsch, Christian Riess
TL;DR
This paper tackles forensic identification of where an audio was recorded by reframing the problem as a few-shot, metric-learning task that avoids case-specific retraining. It introduces EnvId, an end-to-end framework built on Prototypical Networks to perform open-set, N-way $K$-shot environment identification and optional blind regression of environmental parameters such as RT$_{60}$ and volume. A flexible data-generation pipeline simulates realistic reverberant, noisy, and compressed conditions to mirror forensic scenarios, enabling robust evaluation across unseen degradations and out-of-distribution locations. Results show high accuracy on diverse test pools, strong open-set rejection capabilities, and notable robustness to unseen degradations, with EnvId also capable of estimating environmental parameters, thereby providing a practical groundwork for forensic audio analysis in the wild.
Abstract
Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of a recorded audio to its recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide supervised classification tools for closed-set recording environment identification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, supervised learning techniques are not applicable without retraining a classifier on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality. In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining by modeling the task as a few-shot classification problem. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, out-of-distribution reverberation characteristics or recording position mismatches.
