MI-NeRF: Learning a Single Face NeRF from Multiple Identities
Aggelina Chatziagapi, Grigorios G. Chrysos, Dimitris Samaras
TL;DR
This work tackles multi-identity dynamic NeRF for talking faces by training a single network across identities. It introduces a multiplicative interaction module to capture nonlinear identity-expression coupling, enabling disentanglement and robust synthesis of novel expressions. The approach achieves substantial training-time savings (up to ~90%) and supports personalization from short video clips, delivering state-of-the-art performance in facial expression transfer and lip-synced video synthesis across identities. The method is practical for large-scale deployment and can be extended to thousands of identities with further research into multi-identity generalization.
Abstract
In this work, we introduce a method that learns a single dynamic neural radiance field (NeRF) from monocular talking face videos of multiple identities. NeRFs have shown remarkable results in modeling the 4D dynamics and appearance of human faces. However, they require per-identity optimization. Although recent approaches have proposed techniques to reduce the training and rendering time, increasing the number of identities can be expensive. We introduce MI-NeRF (multi-identity NeRF), a single unified network that models complex non-rigid facial motion for multiple identities, using only monocular videos of arbitrary length. The core premise in our method is to learn the non-linear interactions between identity and non-identity specific information with a multiplicative module. By training on multiple videos simultaneously, MI-NeRF not only reduces the total training time compared to standard single-identity NeRFs, but also demonstrates robustness in synthesizing novel expressions for any input identity. We present results for both facial expression transfer and talking face video synthesis. Our method can be further personalized for a target identity given only a short video.
