Table of Contents
Fetching ...

Neural Experts: Mixture of Experts for Implicit Neural Representations

Yizhak Ben-Shabat, Chamin Hewa Koneputugodage, Sameera Ramasinghe, Stephen Gould

TL;DR

It is shown that incorporating a mixture of experts architecture into existing INR formulations provides a boost in speed, accuracy, and memory requirements and introduces novel conditioning and pretraining methods for the gating network that improves convergence to the desired solution.

Abstract

Implicit neural representations (INRs) have proven effective in various tasks including image, shape, audio, and video reconstruction. These INRs typically learn the implicit field from sampled input points. This is often done using a single network for the entire domain, imposing many global constraints on a single function. In this paper, we propose a mixture of experts (MoE) implicit neural representation approach that enables learning local piece-wise continuous functions that simultaneously learns to subdivide the domain and fit locally. We show that incorporating a mixture of experts architecture into existing INR formulations provides a boost in speed, accuracy, and memory requirements. Additionally, we introduce novel conditioning and pretraining methods for the gating network that improves convergence to the desired solution. We evaluate the effectiveness of our approach on multiple reconstruction tasks, including surface reconstruction, image reconstruction, and audio signal reconstruction and show improved performance compared to non-MoE methods.

Neural Experts: Mixture of Experts for Implicit Neural Representations

TL;DR

It is shown that incorporating a mixture of experts architecture into existing INR formulations provides a boost in speed, accuracy, and memory requirements and introduces novel conditioning and pretraining methods for the gating network that improves convergence to the desired solution.

Abstract

Implicit neural representations (INRs) have proven effective in various tasks including image, shape, audio, and video reconstruction. These INRs typically learn the implicit field from sampled input points. This is often done using a single network for the entire domain, imposing many global constraints on a single function. In this paper, we propose a mixture of experts (MoE) implicit neural representation approach that enables learning local piece-wise continuous functions that simultaneously learns to subdivide the domain and fit locally. We show that incorporating a mixture of experts architecture into existing INR formulations provides a boost in speed, accuracy, and memory requirements. Additionally, we introduce novel conditioning and pretraining methods for the gating network that improves convergence to the desired solution. We evaluate the effectiveness of our approach on multiple reconstruction tasks, including surface reconstruction, image reconstruction, and audio signal reconstruction and show improved performance compared to non-MoE methods.

Paper Structure

This paper contains 13 sections, 3 equations, 16 figures, 13 tables.

Figures (16)

  • Figure 1: Illustration comparing between INR architectures for (a) traditional INR, (b) Vanilla MoE INR and (c) the proposed Neural Experts. Two key elements of our approach include the conditioning and pretraining of the manager that improve signal reconstruction with fewer parameters.
  • Figure 2: Image reconstruction. Qualitative (left) and quantitative (right) results. Showing image reconstruction (top), gradients (middle), and laplacian (bottom) for (a) GT, (b) SoftPlus, (c) SoftPlus Wider, (d) Our SoftPlus MoE, (e) SIREN, (f) SIREN Wider, (g) Naive MoE, and (h) Our SIREN MoE. The quantitative results (right) report PSNR as training progresses and show that our Neural Experts architecture with Sine activations outperforms all baselines.
  • Figure 3: Audio reconstruction visualization.Two speakers audio reconstruction is presented. Within each waveform block, the rows represent the ground truth, reconstruction, and error visualization from top to bottom. For our Neural Experts we color code the different experts on the reconstructed waveform.
  • Figure 4: Speaker identity supervision experiment. Our Neural experts audio signal reconstruction with and without speaker identity segmentation supervision on the two speaker waveform. Colors represent expert number. The results show that the manager network is able to allocate an expert to each speaker while not compromising the reconstruction quality.
  • Figure 5: 3D surface reconstruction. Results on the Thai Statue shape. Our method noticeably captures more detail in the toes, nostrils, and eye. The error colormap shows that our method produces a mesh with far less large errors (lighter indicates higher distance to the ground truth surface), and the expert segmentation shows our method provides a subdivision of the space.
  • ...and 11 more figures