SSR: A Training-Free Approach for Streaming 3D Reconstruction

Hui Deng; Yuxin Mao; Yuxin He; Yuchao Dai

SSR: A Training-Free Approach for Streaming 3D Reconstruction

Hui Deng, Yuxin Mao, Yuxin He, Yuchao Dai

Abstract

Streaming 3D reconstruction demands long-horizon state updates under strict latency constraints, yet stateful recurrent models often suffer from geometric drift as errors accumulate over time. We revisit this problem from a Grassmannian manifold perspective: the latent persistent state can be viewed as a subspace representation, i.e., a point evolving on a Grassmannian manifold, where temporal coherence implies the state trajectory should remain on (or near) this manifold.Based on this view, we propose Self-expressive Sequence Regularization (SSR), a plug-and-play, training-free operator that enforces Grassmannian sequence regularity during inference.Given a window of historical states, SSR computes an analytical affinity matrix via the self-expressive property and uses it to regularize the current update, effectively pulling noisy predictions back toward the manifold-consistent trajectory with minimal overhead. Experiments on long-sequence benchmarks demonstrate that SSR consistently reduces drift and improves reconstruction quality across multiple streaming 3D reconstruction tasks.

SSR: A Training-Free Approach for Streaming 3D Reconstruction

Abstract

Paper Structure (17 sections, 11 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 11 equations, 5 figures, 5 tables, 1 algorithm.

Introduction
Related Works
Methodology
Grassmannian Manifold View
Recurrent Streaming Reconstruction
Revisit Self-Expressive Property
Self-expressive Sequence Regularization
Experiments
Video Depth Estimation
Pose Estimation
3D Reconstrcution
Analysis
Conclusion
Impact Statement
Context Forgetting
...and 2 more sections

Figures (5)

Figure 1: Illustration of the Self-expressive Sequence Regularization (SSR). We introduce a training-free regularization scheme for CUT3R. Specifically, we refine the frame-wise states calculated by foundation models through a sliding-window reconstruction process. By leveraging the affinity between temporal states, our method achieves sequence regularization without introducing learnable parameters, making it directly applicable to off-the-shelf pre-trained foundation model.
Figure 2: Qualitative Comparison. As shown in the figure, for long trajectories prone to cumulative drift and for loop trajectories, our method achieves loop closure more effectively than CUT3R and TTT3R, thereby yielding relatively superior reconstruction results.
Figure 3: Affinity matrix visualization. The off-diagonal activations in the affinity matrix clearly indicate that our method maintains strong contextual relationship on early frames even as the sequence progresses, providing a robust context constraint for the current estimation.
Figure :
Figure :

SSR: A Training-Free Approach for Streaming 3D Reconstruction

Abstract

SSR: A Training-Free Approach for Streaming 3D Reconstruction

Authors

Abstract

Table of Contents

Figures (5)