Table of Contents
Fetching ...

PanoTree: Autonomous Photo-Spot Explorer in Virtual Reality Scenes

Tomohiro Hayase, Sacha Braun, Hikari Yanagawa, Itsuki Orito, Yuichi Hiroi

TL;DR

PanoTree tackles two key challenges in social VR: identifying photo-worthy viewpoints and efficiently exploring expansive 3D VR scenes. It introduces a deep photo-scoring network trained on millions of VR photographs and an Hierarchical Optimistic Optimization (HOO)–based explorer that splits the space into regions and biases exploration toward high-scoring areas. The approach achieves human-competitive scoring accuracy and outperforms random search in locating photo spots, while parallel rendering speeds up exploration by a factor of several. Practical applications include automatic thumbnail generation, VR world visualization, and visitor circulation planning, illustrating the value of large-scale VR behavioral data for scene design and user experience. The work also outlines limitations related to non-differentiable rendering and suggests directions toward differentiable rendering and learned policies to further improve efficiency and quality.

Abstract

Social VR platforms enable social, economic, and creative activities by allowing users to create and share their own virtual spaces. In social VR, photography within a VR scene is an important indicator of visitors' activities. Although automatic identification of photo spots within a VR scene can facilitate the process of creating a VR scene and enhance the visitor experience, there are challenges in quantitatively evaluating photos taken in the VR scene and efficiently exploring the large VR scene. We propose PanoTree, an automated photo-spot explorer in VR scenes. To assess the aesthetics of images captured in VR scenes, a deep scoring network is trained on a large dataset of photos collected by a social VR platform to determine whether humans are likely to take similar photos. Furthermore, we propose a Hierarchical Optimistic Optimization (HOO)-based search algorithm to efficiently explore 3D VR spaces with the reward from the scoring network. Our user study shows that the scoring network achieves human-level performance in distinguishing randomly taken images from those taken by humans. In addition, we show applications using the explored photo spots, such as automatic thumbnail generation, support for VR world creation, and visitor flow planning within a VR scene.

PanoTree: Autonomous Photo-Spot Explorer in Virtual Reality Scenes

TL;DR

PanoTree tackles two key challenges in social VR: identifying photo-worthy viewpoints and efficiently exploring expansive 3D VR scenes. It introduces a deep photo-scoring network trained on millions of VR photographs and an Hierarchical Optimistic Optimization (HOO)–based explorer that splits the space into regions and biases exploration toward high-scoring areas. The approach achieves human-competitive scoring accuracy and outperforms random search in locating photo spots, while parallel rendering speeds up exploration by a factor of several. Practical applications include automatic thumbnail generation, VR world visualization, and visitor circulation planning, illustrating the value of large-scale VR behavioral data for scene design and user experience. The work also outlines limitations related to non-differentiable rendering and suggests directions toward differentiable rendering and learned policies to further improve efficiency and quality.

Abstract

Social VR platforms enable social, economic, and creative activities by allowing users to create and share their own virtual spaces. In social VR, photography within a VR scene is an important indicator of visitors' activities. Although automatic identification of photo spots within a VR scene can facilitate the process of creating a VR scene and enhance the visitor experience, there are challenges in quantitatively evaluating photos taken in the VR scene and efficiently exploring the large VR scene. We propose PanoTree, an automated photo-spot explorer in VR scenes. To assess the aesthetics of images captured in VR scenes, a deep scoring network is trained on a large dataset of photos collected by a social VR platform to determine whether humans are likely to take similar photos. Furthermore, we propose a Hierarchical Optimistic Optimization (HOO)-based search algorithm to efficiently explore 3D VR spaces with the reward from the scoring network. Our user study shows that the scoring network achieves human-level performance in distinguishing randomly taken images from those taken by humans. In addition, we show applications using the explored photo spots, such as automatic thumbnail generation, support for VR world creation, and visitor flow planning within a VR scene.
Paper Structure (46 sections, 11 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 46 sections, 11 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Examples of photos taken by humans in various VR worlds on Cluster, a social VR platform.
  • Figure 2: Pairs of negative samples (0) and positive samples (1) from ten test scenes in WI-89 (top) and WI-245 (bottom).
  • Figure 3: Scoring network of the photo in VR space. The network learns whether the input image is human-captured or randomly captured and features the image likely to be captured by a human. The input images are labeled with 1 for human-captured and 0 for randomly captured. During the evaluation, for each input image, the network outputs a score indicating whether the image is likely to have been captured by a human.
  • Figure 4: Overview of the truncated HOO. The number on each node indicates the number of visits. (a) At the beginning of each iteration, starting from the root node, the explorer traverses the tree by selecting the child with the highest B-value and arrives at the leaf (red arrows). (b) After including the arriving leaf nodes in the tree, the space corresponding to the arriving leaf is split, and two child nodes (red nodes) are added. (c) Update of the $B$-value. Starting from the leaf node marked by a red circle, the B-values and number of visits are updated up to the root node (red arrows). Once the update is complete, the explorer moves on to the next iteration. In this example, the explorer will arrive at node (1, 1) in the next iteration. (d) Results of the truncated HOO with $f:\mathbb{R}\supset\mathcal{X}=[-10,10]\rightarrow\mathbb{R}$ as an example (top) and its tree (bottom).
  • Figure 5: (a) 3D Spatial division and tree structure of the PanoTree. Each scene is defined as a cuboid and is divided so that the longest edge is the most likely to be divided. (b) Sampled directional vectors $\boldsymbol{\delta}_{k}$ (blue arrows) in $N_\mathrm{dir}=15$. Each $\boldsymbol{\delta}_{k}$ is a unit vector whose destination (red) is distributed over the unit sphere (green).
  • ...and 6 more figures