PanoTree: Autonomous Photo-Spot Explorer in Virtual Reality Scenes
Tomohiro Hayase, Sacha Braun, Hikari Yanagawa, Itsuki Orito, Yuichi Hiroi
TL;DR
PanoTree tackles two key challenges in social VR: identifying photo-worthy viewpoints and efficiently exploring expansive 3D VR scenes. It introduces a deep photo-scoring network trained on millions of VR photographs and an Hierarchical Optimistic Optimization (HOO)–based explorer that splits the space into regions and biases exploration toward high-scoring areas. The approach achieves human-competitive scoring accuracy and outperforms random search in locating photo spots, while parallel rendering speeds up exploration by a factor of several. Practical applications include automatic thumbnail generation, VR world visualization, and visitor circulation planning, illustrating the value of large-scale VR behavioral data for scene design and user experience. The work also outlines limitations related to non-differentiable rendering and suggests directions toward differentiable rendering and learned policies to further improve efficiency and quality.
Abstract
Social VR platforms enable social, economic, and creative activities by allowing users to create and share their own virtual spaces. In social VR, photography within a VR scene is an important indicator of visitors' activities. Although automatic identification of photo spots within a VR scene can facilitate the process of creating a VR scene and enhance the visitor experience, there are challenges in quantitatively evaluating photos taken in the VR scene and efficiently exploring the large VR scene. We propose PanoTree, an automated photo-spot explorer in VR scenes. To assess the aesthetics of images captured in VR scenes, a deep scoring network is trained on a large dataset of photos collected by a social VR platform to determine whether humans are likely to take similar photos. Furthermore, we propose a Hierarchical Optimistic Optimization (HOO)-based search algorithm to efficiently explore 3D VR spaces with the reward from the scoring network. Our user study shows that the scoring network achieves human-level performance in distinguishing randomly taken images from those taken by humans. In addition, we show applications using the explored photo spots, such as automatic thumbnail generation, support for VR world creation, and visitor flow planning within a VR scene.
