Table of Contents
Fetching ...

NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud Serialization

Zhen Li, Weiwei Sun, Shrisudhan Govindarajan, Shaobo Xia, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi

TL;DR

NoKSR tackles large-scale surface reconstruction from irregular point clouds by predicting a neural signed distance field without kernel-based voxelization. It capitalizes on locality-preserving point cloud serialization and a lightweight multi-scale aggregation built on PointTransformerV3 to efficiently gather local context across several serialization levels, then decodes to $D(\mathbf{q})$ via a simple MLP. The method introduces a Hilbert-curve based neighborhood function to retrieve approximate neighbors with minimal information loss, and it uses a triad of losses $\mathcal{L}_{\text{SDF}}$, $\mathcal{L}_{\text{Eikonal}}$, and $\mathcal{L}_{\text{mask}}$ to ensure accurate, distance-field-like surfaces. Empirically, NoKSR delivers state-of-the-art accuracy and substantially lower latency on outdoor datasets, while maintaining strong indoor performance, and it demonstrates robust cross-domain generalization and informative ablations. This point-based, kernel-free approach mitigates the information loss of voxelization and shows practical potential for scalable, high-fidelity surface reconstruction in real-world scenarios.

Abstract

We present a novel approach to large-scale point cloud surface reconstruction by developing an efficient framework that converts an irregular point cloud into a signed distance field (SDF). Our backbone builds upon recent transformer-based architectures (i.e., PointTransformerV3), that serializes the point cloud into a locality-preserving sequence of tokens. We efficiently predict the SDF value at a point by aggregating nearby tokens, where fast approximate neighbors can be retrieved thanks to the serialization. We serialize the point cloud at different levels/scales, and non-linearly aggregate a feature to predict the SDF value. We show that aggregating across multiple scales is critical to overcome the approximations introduced by the serialization (i.e. false negatives in the neighborhood). Our frameworks sets the new state-of-the-art in terms of accuracy and efficiency (better or similar performance with half the latency of the best prior method, coupled with a simpler implementation), particularly on outdoor datasets where sparse-grid methods have shown limited performance.

NoKSR: Kernel-Free Neural Surface Reconstruction via Point Cloud Serialization

TL;DR

NoKSR tackles large-scale surface reconstruction from irregular point clouds by predicting a neural signed distance field without kernel-based voxelization. It capitalizes on locality-preserving point cloud serialization and a lightweight multi-scale aggregation built on PointTransformerV3 to efficiently gather local context across several serialization levels, then decodes to via a simple MLP. The method introduces a Hilbert-curve based neighborhood function to retrieve approximate neighbors with minimal information loss, and it uses a triad of losses , , and to ensure accurate, distance-field-like surfaces. Empirically, NoKSR delivers state-of-the-art accuracy and substantially lower latency on outdoor datasets, while maintaining strong indoor performance, and it demonstrates robust cross-domain generalization and informative ablations. This point-based, kernel-free approach mitigates the information loss of voxelization and shows practical potential for scalable, high-fidelity surface reconstruction in real-world scenarios.

Abstract

We present a novel approach to large-scale point cloud surface reconstruction by developing an efficient framework that converts an irregular point cloud into a signed distance field (SDF). Our backbone builds upon recent transformer-based architectures (i.e., PointTransformerV3), that serializes the point cloud into a locality-preserving sequence of tokens. We efficiently predict the SDF value at a point by aggregating nearby tokens, where fast approximate neighbors can be retrieved thanks to the serialization. We serialize the point cloud at different levels/scales, and non-linearly aggregate a feature to predict the SDF value. We show that aggregating across multiple scales is critical to overcome the approximations introduced by the serialization (i.e. false negatives in the neighborhood). Our frameworks sets the new state-of-the-art in terms of accuracy and efficiency (better or similar performance with half the latency of the best prior method, coupled with a simpler implementation), particularly on outdoor datasets where sparse-grid methods have shown limited performance.

Paper Structure

This paper contains 40 sections, 10 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: To locally predict the SDF value that (implicitly) reconstructs the surface, the pivotal operation is to aggregate the information (i.e. features) of nearby points. (left) Working on the point cloud directly is difficult, as there is no simple way to implement multi-scale architectures suitable for large scale point cloud processing. (middle) State-of-the-art methods therefore opt to quantize the input point cloud to a voxel grid, and employ established sparse CNN backbones, but quantization leads to information loss. (right) By fetching approximate neighbors via serialization we can fetch the local context efficiently and avoid information loss. We summarize the performance of representative works on a large scale outdoor dataset (CARLAdosovitskiy2017carla), and show that our method achieves the best performance in both time efficiency (latency) and accuracy (CD and F-score); for additional details see \ref{['sec:results']}.
  • Figure 2: Overview: We map a sparse input point cloud with a point cloud backbone wu2024point into a point feature hierarchy, from which we compute the signed distance of a query. At each level, we utilize the efficient procedure defined in \ref{['sec:neighbor_func']} to retrieve local neighborhoods of the query. We then compute per-level features with the aggregation module defined in \ref{['sec:aggregation']}. At last, we sum per-level features and convert it into the signed distance with an MLP.
  • Figure 3: Neighborhood function -- (left) retrieving a local neighborhood with K-nearest neighbor(KNN) or ball-query methods is challenging to implement efficiently on GPU hardware. (right) we propose to retrieve a neighborhood from a 1-D ordered list, by serializing points along a Hibert curve hilbert1935stetige, and excluding the impact of points distant from the query (i.e. remove false positives).
  • Figure 4: Qualitative results on CARLAdosovitskiy2017carla and SyntheticRoompeng2020convoccnet -- our method achieves high quality surface reconstructions which preserve more details than NKSR huang2023neural which loses information due to quantization for large and non-uniformly sampled datasets like Carla.
  • Figure 5: Qualitative results on ScanNetdai2017scannet: We compare our method with prior SOTA huang2023neural and Ours (Minkowski) choy20194d that is more comparable as it only differs from ours in the backbone. Our method achieves reconstruction of similar quality to the SOTA. It also significantly outperforms Ours (Minkowski), highlighting the importance of point-based methods.
  • ...and 5 more figures