Table of Contents
Fetching ...

Sound Field Translation and Mixed Source Model for Virtual Applications with Perceptual Validation

Lachlan Birnie, Thushara Abhayapala, Vladimir Tourbabin, Prasanga Samarasinghe

TL;DR

This paper proposes a method for listener translation in an acoustic reproduction that incorporates a mixture of near-field and far-field sources in a sparsely expanded virtual environment and perceptually validate the method through a Multiple Stimulus with Hidden Reference and Anchor (MUSHRA) experiment.

Abstract

Non-interactive and linear experiences like cinema film offer high quality surround sound audio to enhance immersion, however the listener's experience is usually fixed to a single acoustic perspective. With the rise of virtual reality, there is a demand for recording and recreating real-world experiences in a way that allows for the user to interact and move within the reproduction. Conventional sound field translation techniques take a recording and expand it into an equivalent environment of virtual sources. However, the finite sampling of a commercial higher order microphone produces an acoustic sweet-spot in the virtual reproduction. As a result, the technique remains to restrict the listener's navigable region. In this paper, we propose a method for listener translation in an acoustic reproduction that incorporates a mixture of near-field and far-field sources in a sparsely expanded virtual environment. We perceptually validate the method through a Multiple Stimulus with Hidden Reference and Anchor (MUSHRA) experiment. Compared to the planewave benchmark, the proposed method offers both improved source localizability and robustness to spectral distortions at translated positions. A cross-examination with numerical simulations demonstrated that the sparse expansion relaxes the inherent sweet-spot constraint, leading to the improved localizability for sparse environments. Additionally, the proposed method is seen to better reproduce the intensity and binaural room impulse response spectra of near-field environments, further supporting the strong perceptual results.

Sound Field Translation and Mixed Source Model for Virtual Applications with Perceptual Validation

TL;DR

This paper proposes a method for listener translation in an acoustic reproduction that incorporates a mixture of near-field and far-field sources in a sparsely expanded virtual environment and perceptually validate the method through a Multiple Stimulus with Hidden Reference and Anchor (MUSHRA) experiment.

Abstract

Non-interactive and linear experiences like cinema film offer high quality surround sound audio to enhance immersion, however the listener's experience is usually fixed to a single acoustic perspective. With the rise of virtual reality, there is a demand for recording and recreating real-world experiences in a way that allows for the user to interact and move within the reproduction. Conventional sound field translation techniques take a recording and expand it into an equivalent environment of virtual sources. However, the finite sampling of a commercial higher order microphone produces an acoustic sweet-spot in the virtual reproduction. As a result, the technique remains to restrict the listener's navigable region. In this paper, we propose a method for listener translation in an acoustic reproduction that incorporates a mixture of near-field and far-field sources in a sparsely expanded virtual environment. We perceptually validate the method through a Multiple Stimulus with Hidden Reference and Anchor (MUSHRA) experiment. Compared to the planewave benchmark, the proposed method offers both improved source localizability and robustness to spectral distortions at translated positions. A cross-examination with numerical simulations demonstrated that the sparse expansion relaxes the inherent sweet-spot constraint, leading to the improved localizability for sparse environments. Additionally, the proposed method is seen to better reproduce the intensity and binaural room impulse response spectra of near-field environments, further supporting the strong perceptual results.

Paper Structure

This paper contains 33 sections, 37 equations, 11 figures.

Figures (11)

  • Figure 1: Illustration of the equivalent virtual planewave distribution. The listener's perspective is fixed at the distribution center $\boldsymbol{o}$, where a phase shift applied to the driving function translates the sound field about the listener.
  • Figure 2: Illustration of the equivalent virtual mixedwave sound field. The listener is translated to $\boldsymbol{d}$, and the vectors $(\boldsymbol{y}_\ell;\boldsymbol{d})$ are updated with the HRTF to auralize an immersive reproduction.
  • Figure 3: Box plot of perception experiment scores for source localization (a) and (c), and basic audio quality (b) and (d). Each box bounds the interquartile range (IQR) with the center bar indicating the median score, and the whiskers extended to a maximum of $1.5\times\text{IQR}$. The v-shaped notches in the box refer to the $95\%$ confidence interval. When the notches between two boxes do not overlap, it can be concluded with $95\%$ confidence that the true medians differ.
  • Figure 4: (a) True pressure field and (b) intensity field at $1000\text{ Hz}$ in the XY-plane with the point-source at $(1, 0, 0)\text{ m}$, where intensity magnitude is given by the color-map.
  • Figure 5: Truncated measurement of (a) the pressure field and (b) the intensity field at $1000\text{ Hz}$ in the XY-plane for the point-source at $(1, 0, 0)\text{ m}$, where (c) is PE and (d) is IDE.
  • ...and 6 more figures