MC-NeRF: Multi-Camera Neural Radiance Fields for Multi-Camera Image Acquisition Systems
Yu Gao, Lutong Su, Hao Liang, Yufeng Yue, Yi Yang, Mengyin Fu
TL;DR
MC-NeRF introduces a joint optimization framework for intrinsic and extrinsic camera parameters within Neural Radiance Fields in multi-camera systems, removing the assumption of a single camera and eliminating the need for initial parameter estimates. It employs an auxiliary calibration scheme with Pack1 and Pack2 images to decouple parameters via reprojection constraints and bundle adjustment, enabling accurate per-camera intrinsic/extrinsic recovery and real-world scale. The method is validated on a newly built 88-camera system with both synthetic and real-world datasets, showing competitive or superior camera parameter estimation and rendering quality compared with existing NeRF and 3D Gaussian baselines. This work lowers barriers to deploying NeRF in complex multi-camera setups and provides datasets and code to support reproducible multi-camera 3D reconstruction at real-world scale.
Abstract
Neural Radiance Fields (NeRF) use multi-view images for 3D scene representation, demonstrating remarkable performance. As one of the primary sources of multi-view images, multi-camera systems encounter challenges such as varying intrinsic parameters and frequent pose changes. Most previous NeRF-based methods assume a unique camera and rarely consider multi-camera scenarios. Besides, some NeRF methods that can optimize intrinsic and extrinsic parameters still remain susceptible to suboptimal solutions when these parameters are poor initialized. In this paper, we propose MC-NeRF, a method that enables joint optimization of both intrinsic and extrinsic parameters alongside NeRF. The method also supports each image corresponding to independent camera parameters. First, we tackle coupling issue and the degenerate case that arise from the joint optimization between intrinsic and extrinsic parameters. Second, based on the proposed solutions, we introduce an efficient calibration image acquisition scheme for multi-camera systems, including the design of calibration object. Finally, we present an end-to-end network with training sequence that enables the estimation of intrinsic and extrinsic parameters, along with the rendering network. Furthermore, recognizing that most existing datasets are designed for a unique camera, we construct a real multi-camera image acquisition system and create a corresponding new dataset, which includes both simulated data and real-world captured images. Experiments confirm the effectiveness of our method when each image corresponds to different camera parameters. Specifically, we use multi-cameras, each with different intrinsic and extrinsic parameters in real-world system, to achieve 3D scene representation without providing initial poses.
