Diversity-Driven View Subset Selection for Indoor Novel View Synthesis

Zehao Wang; Han Zhou; Matthew B. Blaschko; Tinne Tuytelaars; Minye Wu

Diversity-Driven View Subset Selection for Indoor Novel View Synthesis

Zehao Wang, Han Zhou, Matthew B. Blaschko, Tinne Tuytelaars, Minye Wu

TL;DR

This work tackles the inefficiency of indoor monocular novel view synthesis by formulating frame subset selection as maximizing a diversity-aware utility $z(\mathbb{S})$ under a size constraint. It introduces a multifactor diversity distance across 3D, angular, and semantic spaces and evaluates three utility functions—log-determinant (DPP), Max-Min Distance, and Uniform Coverage—via greedy optimization. A new IndoorTraj dataset with complex human-like trajectories enables realistic evaluation, showing that selecting 5–20% of frames can outperform or match full-data baselines under equal compute, highlighting substantial gains in efficiency. The approach offers practical pathways to scalable indoor neural rendering without sacrificing rendering quality, and provides theoretical and empirical guidance through ablations and extensive experiments.

Abstract

Novel view synthesis of indoor scenes can be achieved by capturing a monocular video sequence of the environment. However, redundant information caused by artificial movements in the input video data reduces the efficiency of scene modeling. To address this, we formulate the problem as a combinatorial optimization task for view subset selection. In this work, we propose a novel subset selection framework that integrates a comprehensive diversity-based measurement with well-designed utility functions. We provide a theoretical analysis of these utility functions and validate their effectiveness through extensive experiments. Furthermore, we introduce IndoorTraj, a novel dataset designed for indoor novel view synthesis, featuring complex and extended trajectories that simulate intricate human behaviors. Experiments on IndoorTraj show that our framework consistently outperforms baseline strategies while using only 5-20% of the data, highlighting its remarkable efficiency and effectiveness. The code is available at: https://github.com/zehao-wang/IndoorTraj

Diversity-Driven View Subset Selection for Indoor Novel View Synthesis

TL;DR

This work tackles the inefficiency of indoor monocular novel view synthesis by formulating frame subset selection as maximizing a diversity-aware utility

under a size constraint. It introduces a multifactor diversity distance across 3D, angular, and semantic spaces and evaluates three utility functions—log-determinant (DPP), Max-Min Distance, and Uniform Coverage—via greedy optimization. A new IndoorTraj dataset with complex human-like trajectories enables realistic evaluation, showing that selecting 5–20% of frames can outperform or match full-data baselines under equal compute, highlighting substantial gains in efficiency. The approach offers practical pathways to scalable indoor neural rendering without sacrificing rendering quality, and provides theoretical and empirical guidance through ablations and extensive experiments.

Abstract

Paper Structure (23 sections, 2 theorems, 13 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 2 theorems, 13 equations, 6 figures, 5 tables, 1 algorithm.

Introduction
Related work
Neural Rendering in Indoor Scenes
Sampling in Neural Rendering Methods
Preliminary
Distance Measures
Utility Function Design for Subset Selection
Log Determinant Function
Max-Min Distance Function
Uniform Coverage Function
Experiments
Dataset
Baselines
Implementation Details
Main Results
...and 8 more sections

Key Result

Proposition 1

The log determinant function is submodular.

Figures (6)

Figure 1: Data flow illustrating our subset selection framework. The depicted examples highlight how different factors in the distance measures influence the selection process when choosing two out of three cameras. The distance matrix is symmetric and integrates multiple factors, as defined in Table \ref{['tab: similarity']}.
Figure 2: Qualitative results on scene openplan-1 with a camera selection ratio of 5%: The first column presents a top-down heatmap at the same scale, showing the density of selected camera positions. For clearer demonstration, a 30% subset of the selected cameras from each method is drawn on the heatmap. The camera poses of the testing views marked in red are labeled as $P_1$, $P_2$, and $P_3$.
Figure 3: Performance across different sample ratios. The experiments are set on IndoorTraj dataset. Results are based on 30k iterations of training using the Gaussian Splatting backbone across different sampling strategies.
Figure 4: Visual characteristics in the IndoorTraj dataset, featuring object arrangement complexity, transparent objects, highly reflective surfaces, and areas with intricate textures.
Figure 5: IndoorTraj dataset: Training and testing trajectories and example views
...and 1 more figures

Theorems & Definitions (8)

Definition 1: Submodularity
Definition 2: Monotonicity
Proposition 1
Definition 3: Domain of utility functions Defined by the Marginal Gain
Definition 4: Max-Min Distance Function
Proposition 2
proof
Definition 5: Uniform Coverage Function

Diversity-Driven View Subset Selection for Indoor Novel View Synthesis

TL;DR

Abstract

Diversity-Driven View Subset Selection for Indoor Novel View Synthesis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (8)