STRinGS: Selective Text Refinement in Gaussian Splatting
Abhinav Raundhal, Gaurav Behera, P J Narayanan, Ravi Kiran Sarvadevabhatla, Makarand Tapaswi
TL;DR
3D Gaussian Splatting often loses fine text details, hindering text-rich scene understanding. STRinGS introduces a text-aware, two-phase refinement that isolates and densifies text Gaussians before full-scene optimization, yielding sharper, more readable text early in training. The approach is validated with OCR-CER improvements across multiple datasets and introduces STRinGS-360 to benchmark text readability in 3D reconstructions. The work demonstrates that targeted text refinement can achieve high semantic fidelity without sacrificing global visual quality, enabling time-sensitive, text-rich 3D scene understanding.
Abstract
Text as signs, labels, or instructions is a critical element of real-world scenes as they can convey important contextual information. 3D representations such as 3D Gaussian Splatting (3DGS) struggle to preserve fine-grained text details, while achieving high visual fidelity. Small errors in textual element reconstruction can lead to significant semantic loss. We propose STRinGS, a text-aware, selective refinement framework to address this issue for 3DGS reconstruction. Our method treats text and non-text regions separately, refining text regions first and merging them with non-text regions later for full-scene optimization. STRinGS produces sharp, readable text even in challenging configurations. We introduce a text readability measure OCR Character Error Rate (CER) to evaluate the efficacy on text regions. STRinGS results in a 63.6% relative improvement over 3DGS at just 7K iterations. We also introduce a curated dataset STRinGS-360 with diverse text scenarios to evaluate text readability in 3D reconstruction. Our method and dataset together push the boundaries of 3D scene understanding in text-rich environments, paving the way for more robust text-aware reconstruction methods.
