Table of Contents
Fetching ...

Diversity-Driven View Subset Selection for Indoor Novel View Synthesis

Zehao Wang, Han Zhou, Matthew B. Blaschko, Tinne Tuytelaars, Minye Wu

TL;DR

This work tackles the inefficiency of indoor monocular novel view synthesis by formulating frame subset selection as maximizing a diversity-aware utility $z(\mathbb{S})$ under a size constraint. It introduces a multifactor diversity distance across 3D, angular, and semantic spaces and evaluates three utility functions—log-determinant (DPP), Max-Min Distance, and Uniform Coverage—via greedy optimization. A new IndoorTraj dataset with complex human-like trajectories enables realistic evaluation, showing that selecting 5–20% of frames can outperform or match full-data baselines under equal compute, highlighting substantial gains in efficiency. The approach offers practical pathways to scalable indoor neural rendering without sacrificing rendering quality, and provides theoretical and empirical guidance through ablations and extensive experiments.

Abstract

Novel view synthesis of indoor scenes can be achieved by capturing a monocular video sequence of the environment. However, redundant information caused by artificial movements in the input video data reduces the efficiency of scene modeling. To address this, we formulate the problem as a combinatorial optimization task for view subset selection. In this work, we propose a novel subset selection framework that integrates a comprehensive diversity-based measurement with well-designed utility functions. We provide a theoretical analysis of these utility functions and validate their effectiveness through extensive experiments. Furthermore, we introduce IndoorTraj, a novel dataset designed for indoor novel view synthesis, featuring complex and extended trajectories that simulate intricate human behaviors. Experiments on IndoorTraj show that our framework consistently outperforms baseline strategies while using only 5-20% of the data, highlighting its remarkable efficiency and effectiveness. The code is available at: https://github.com/zehao-wang/IndoorTraj

Diversity-Driven View Subset Selection for Indoor Novel View Synthesis

TL;DR

This work tackles the inefficiency of indoor monocular novel view synthesis by formulating frame subset selection as maximizing a diversity-aware utility under a size constraint. It introduces a multifactor diversity distance across 3D, angular, and semantic spaces and evaluates three utility functions—log-determinant (DPP), Max-Min Distance, and Uniform Coverage—via greedy optimization. A new IndoorTraj dataset with complex human-like trajectories enables realistic evaluation, showing that selecting 5–20% of frames can outperform or match full-data baselines under equal compute, highlighting substantial gains in efficiency. The approach offers practical pathways to scalable indoor neural rendering without sacrificing rendering quality, and provides theoretical and empirical guidance through ablations and extensive experiments.

Abstract

Novel view synthesis of indoor scenes can be achieved by capturing a monocular video sequence of the environment. However, redundant information caused by artificial movements in the input video data reduces the efficiency of scene modeling. To address this, we formulate the problem as a combinatorial optimization task for view subset selection. In this work, we propose a novel subset selection framework that integrates a comprehensive diversity-based measurement with well-designed utility functions. We provide a theoretical analysis of these utility functions and validate their effectiveness through extensive experiments. Furthermore, we introduce IndoorTraj, a novel dataset designed for indoor novel view synthesis, featuring complex and extended trajectories that simulate intricate human behaviors. Experiments on IndoorTraj show that our framework consistently outperforms baseline strategies while using only 5-20% of the data, highlighting its remarkable efficiency and effectiveness. The code is available at: https://github.com/zehao-wang/IndoorTraj
Paper Structure (23 sections, 2 theorems, 13 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 2 theorems, 13 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

The log determinant function is submodular.

Figures (6)

  • Figure 1: Data flow illustrating our subset selection framework. The depicted examples highlight how different factors in the distance measures influence the selection process when choosing two out of three cameras. The distance matrix is symmetric and integrates multiple factors, as defined in Table \ref{['tab: similarity']}.
  • Figure 2: Qualitative results on scene openplan-1 with a camera selection ratio of 5%: The first column presents a top-down heatmap at the same scale, showing the density of selected camera positions. For clearer demonstration, a 30% subset of the selected cameras from each method is drawn on the heatmap. The camera poses of the testing views marked in red are labeled as $P_1$, $P_2$, and $P_3$.
  • Figure 3: Performance across different sample ratios. The experiments are set on IndoorTraj dataset. Results are based on 30k iterations of training using the Gaussian Splatting backbone across different sampling strategies.
  • Figure 4: Visual characteristics in the IndoorTraj dataset, featuring object arrangement complexity, transparent objects, highly reflective surfaces, and areas with intricate textures.
  • Figure 5: IndoorTraj dataset: Training and testing trajectories and example views
  • ...and 1 more figures

Theorems & Definitions (8)

  • Definition 1: Submodularity
  • Definition 2: Monotonicity
  • Proposition 1
  • Definition 3: Domain of utility functions Defined by the Marginal Gain
  • Definition 4: Max-Min Distance Function
  • Proposition 2
  • proof
  • Definition 5: Uniform Coverage Function