Table of Contents
Fetching ...

Dual-Camera Smooth Zoom on Mobile Phones

Renlong Wu, Zhilu Zhang, Yu Yang, Wangmeng Zuo

TL;DR

The paper tackles abrupt preview jumps when zooming between dual cameras on mobile phones by defining the dual-camera smooth zoom (DCSZ) task and proposing a data factory built around ZoomGS that renders continuous virtual camera sequences. ZoomGS creates camera-specific 3D Gaussian Splatting models, enabling synthetic DCSZ data by interpolating extrinsic/intrinsic parameters and camera- dependent encodings, which then is used to fine-tune frame interpolation (FI) models. Across six state-of-the-art FI methods, fine-tuning on the synthetic DCSZ data yields consistent improvements on both synthetic and real-world datasets, with qualitative results showing reduced artifacts and more plausible intermediate content. The approach demonstrates practical potential for improving smooth zoom UX on mobile devices, and the authors provide code, data, and pretrained models for public use.

Abstract

When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task. The datasets, codes, and pre-trained models will are available at https://github.com/ZcsrenlongZ/ZoomGS.

Dual-Camera Smooth Zoom on Mobile Phones

TL;DR

The paper tackles abrupt preview jumps when zooming between dual cameras on mobile phones by defining the dual-camera smooth zoom (DCSZ) task and proposing a data factory built around ZoomGS that renders continuous virtual camera sequences. ZoomGS creates camera-specific 3D Gaussian Splatting models, enabling synthetic DCSZ data by interpolating extrinsic/intrinsic parameters and camera- dependent encodings, which then is used to fine-tune frame interpolation (FI) models. Across six state-of-the-art FI methods, fine-tuning on the synthetic DCSZ data yields consistent improvements on both synthetic and real-world datasets, with qualitative results showing reduced artifacts and more plausible intermediate content. The approach demonstrates practical potential for improving smooth zoom UX on mobile devices, and the authors provide code, data, and pretrained models for public use.

Abstract

When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task. The datasets, codes, and pre-trained models will are available at https://github.com/ZcsrenlongZ/ZoomGS.
Paper Structure (25 sections, 19 equations, 11 figures, 4 tables)

This paper contains 25 sections, 19 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: (a) Existing dual-camera zoom. For zooming between UW and W cameras (i.e., from $\times 0.6$ to $\times 1.0$), smartphones (e.g., Xiaomi, OPPO, and vivo) generally crop out the specific area from the UW image, and scale the image up to the dimensions of the original. When the zoom factor changes from 0.9 to 1.0, the lens has to switch from UW to W, where notable geometric content and image color jump happen in the preview. (b) Our proposed dual-camera smooth zoom (DCSZ). (c) The geometric content and image color jump in (a) existing dual-camera zoom. Some examples can be found https://dualcamerasmoothzoom.github.io.
  • Figure 2: Overview of the proposed method. (a) Data preparation for data factory. We collect multi-view dual-camera images and calibrate their camera extrinsic and intrinsic parameters. (b) Construction of ZoomGS in data factory. ZoomGS employs a camera transition (CamTrans) module to transform the base (i.e., UW camera) Gaussians to the specific camera Gaussians according to the camera encoding. (c) Data generation from data factory. The virtual (V) camera parameters are constructed by interpolating the dual-camera ones, and are then input into ZoomGS to generate zoom sequences. (d) Fine-tuning a frame interpolation (FI) model with the constructed zoom sequences.
  • Figure 3: Visual comparisons on the synthetic dataset. The FI models synthesize the intermediate geometry content between dual cameras, as indicated with yellow arrows. The fine-tuned FI models generate more photo-realistic details.
  • Figure 4: Visual comparisons on the real-world dataset. The FI models also synthesize the intermediate geometry content in the real world, as indicated with yellow arrows. Besides, the fine-tuned FI models generate fewer visual artifacts.
  • Figure 5: Effect of Lipschitz regularization for ZoomGS. Without $\mathcal{L}_{lipschitz}$, it easily produces some visual artifacts, as indicated with black arrows in (b).
  • ...and 6 more figures