Table of Contents
Fetching ...

DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images

Zaid Tasneem, Akshat Dave, Abhishek Singh, Kushagra Tiwary, Praneeth Vepakomma, Ashok Veeraraghavan, Ramesh Raskar

TL;DR

DecentNeRF advances privacy-preserving, scalable 3D scene learning from crowdsourced images by decomposing each scene into a global static NeRF and a personal dynamic NeRF. It trains the global component via learned federated aggregation over on-device data, with secure SMPC-based averaging to prevent the server from accessing individual user models. The approach achieves photorealistic reconstructions with roughly ≈ 10^4× less server compute than centralized NeRF training and demonstrates reduced personal-content reconstruction on real-world phototourism data and synthetic occlusion scenarios. The work highlights practical pathways and limitations for large-scale, decentralized neural rendering, including privacy concerns and potential mobile deployments.

Abstract

Neural radiance fields (NeRFs) show potential for transforming images captured worldwide into immersive 3D visual experiences. However, most of this captured visual data remains siloed in our camera rolls as these images contain personal details. Even if made public, the problem of learning 3D representations of billions of scenes captured daily in a centralized manner is computationally intractable. Our approach, DecentNeRF, is the first attempt at decentralized, crowd-sourced NeRFs that require $\sim 10^4\times$ less server computing for a scene than a centralized approach. Instead of sending the raw data, our approach requires users to send a 3D representation, distributing the high computation cost of training centralized NeRFs between the users. It learns photorealistic scene representations by decomposing users' 3D views into personal and global NeRFs and a novel optimally weighted aggregation of only the latter. We validate the advantage of our approach to learn NeRFs with photorealism and minimal server computation cost on structured synthetic and real-world photo tourism datasets. We further analyze how secure aggregation of global NeRFs in DecentNeRF minimizes the undesired reconstruction of personal content by the server.

DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images

TL;DR

DecentNeRF advances privacy-preserving, scalable 3D scene learning from crowdsourced images by decomposing each scene into a global static NeRF and a personal dynamic NeRF. It trains the global component via learned federated aggregation over on-device data, with secure SMPC-based averaging to prevent the server from accessing individual user models. The approach achieves photorealistic reconstructions with roughly ≈ 10^4× less server compute than centralized NeRF training and demonstrates reduced personal-content reconstruction on real-world phototourism data and synthetic occlusion scenarios. The work highlights practical pathways and limitations for large-scale, decentralized neural rendering, including privacy concerns and potential mobile deployments.

Abstract

Neural radiance fields (NeRFs) show potential for transforming images captured worldwide into immersive 3D visual experiences. However, most of this captured visual data remains siloed in our camera rolls as these images contain personal details. Even if made public, the problem of learning 3D representations of billions of scenes captured daily in a centralized manner is computationally intractable. Our approach, DecentNeRF, is the first attempt at decentralized, crowd-sourced NeRFs that require less server computing for a scene than a centralized approach. Instead of sending the raw data, our approach requires users to send a 3D representation, distributing the high computation cost of training centralized NeRFs between the users. It learns photorealistic scene representations by decomposing users' 3D views into personal and global NeRFs and a novel optimally weighted aggregation of only the latter. We validate the advantage of our approach to learn NeRFs with photorealism and minimal server computation cost on structured synthetic and real-world photo tourism datasets. We further analyze how secure aggregation of global NeRFs in DecentNeRF minimizes the undesired reconstruction of personal content by the server.
Paper Structure (19 sections, 2 equations, 8 figures, 2 tables)

This paper contains 19 sections, 2 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of our approach: Our framework leverages geotagged images of locations siloed worldwide in user photo galleries. It constructs neural radiance fields (NeRFs) for immersive 3D viewing in a decentralized manner, minimizing the reconstruction of personal content at the server. (Left): We showcase the potential to generate global-scale 3D scene representations for applications like city-scale NeRFs, immersive past event experiences, and virtual photo-tourism. We highlight the capture of diverse viewpoints across users. (Right): We compare our approach against centralized and existing decentralized baselines for a photo-tourism example. We demonstrate our approach's optimal tradeoffs between low-server compute and photorealism while minimizing personal content reconstruction on the server.
  • Figure 2: Key features of our approach (a) Overview: Personal and global MLPs are trained on user devices to separate personal and global content from local images. After each training round, the server performs a learned federation of users' global MLPs using a secure MPC protocol and distributes the updated global MLP back to each user. (b) Results on reducing personal content leakage: We notice that users' global MLPs contain personal content during the initial rounds. Our secure MPC protocol ensures the server only sees the averaged global MLP from which the rendering of users' personal content is minimal. Over federation rounds, global and personal MLPs separate content through learned weighted averaging, enabling high-fidelity rendering from the server's global MLP.
  • Figure 3: DecentNeRF architecture: On user devices, we consider NeRFs with the following architecture where the personal MLP is always local to the user and the weights of global MLP are securely aggregated at the server. We also highlight what DecentNeRF learns to represent in its Global and Personal MLPs.
  • Figure 4: Ablation on Learned Federation: We demonstrate that with Learned Federation of Global MLPs, our approach (DecentNeRF) learns to weigh clients with less occlusion over 30 rounds of training which leads to better reconstruction quality overall compared to one with FedAvg aggregation scheme i.e. DecentNeRF(-L) and FedNeRF.
  • Figure 5: Qualitative results on Novel Blender dataset: DecentNeRF simultaneously removes unwanted personal content ('lego persons') while reconstructing fine details of global content ('lego excavator' and 'ship'). In contrast, FedNeRFholden2023federated hallucinates 'blobs' where personal content exists in user views.
  • ...and 3 more figures