GSN: Generalisable Segmentation in Neural Radiance Field
Vinayak Gupta, Rahul Goel, Sirikonda Dhawal, P. J. Narayanan
TL;DR
The paper addresses the limitation of traditional and many generalised radiance-field methods that either overfit to a scene or struggle to provide consistent semantic labels across unseen scenes. It introduces GSN, a generalised NeRF Transformer that distills multiple semantic feature fields into a single generalisable representation, enabling on-the-fly novel-view rendering with per-pixel semantics. A two-stage training paradigm combines RGB view synthesis across scenes (Stage I) with feature distillation to a student head guided by a teacher (Stage II), using semantic cues such as DINO to drive segmentation. The approach achieves segmentation performance on par with scene-specific methods on LLFF data, demonstrates multi-view consistency, and supports integrating diverse semantic fields, offering a practical path toward scalable, semantic-rich generalisable radiance fields for downstream tasks.
Abstract
Traditional Radiance Field (RF) representations capture details of a specific scene and must be trained afresh on each scene. Semantic feature fields have been added to RFs to facilitate several segmentation tasks. Generalised RF representations learn the principles of view interpolation. A generalised RF can render new views of an unknown and untrained scene, given a few views. We present a way to distil feature fields into the generalised GNT representation. Our GSN representation generates new views of unseen scenes on the fly along with consistent, per-pixel semantic features. This enables multi-view segmentation of arbitrary new scenes. We show different semantic features being distilled into generalised RFs. Our multi-view segmentation results are on par with methods that use traditional RFs. GSN closes the gap between standard and generalisable RF methods significantly. Project Page: https://vinayak-vg.github.io/GSN/
