Table of Contents
Fetching ...

FoV-Net: Rotation-Invariant CAD B-rep Learning via Field-of-View Ray Casting

Matteo Ballegeer, Dries F. Benoit

TL;DR

FoV-Net is introduced, the first B-rep learning framework that captures both local surface geometry and global structural context in a rotation-invariant manner, and achieves state-of-the-art performance on B-rep classification and segmentation benchmarks, demonstrating robustness to arbitrary rotations while also requiring less training data to achieve strong results.

Abstract

Learning directly from boundary representations (B-reps) has significantly advanced 3D CAD analysis. However, state-of-the-art B-rep learning methods rely on absolute coordinates and normals to encode global context, making them highly sensitive to rotations. Our experiments reveal that models achieving over 95% accuracy on aligned benchmarks can collapse to as low as 10% under arbitrary $\mathbf{SO}(3)$ rotations. To address this, we introduce FoV-Net, the first B-rep learning framework that captures both local surface geometry and global structural context in a rotation-invariant manner. Each face is represented by a Local Reference Frame (LRF) UV-grid that encodes its local surface geometry, and by Field-of-View (FoV) grids that capture the surrounding 3D context by casting rays and recording intersections with neighboring faces. Lightweight CNNs extract per-face features, which are propagated over the B-rep graph using a graph attention network. FoV-Net achieves state-of-the-art performance on B-rep classification and segmentation benchmarks, demonstrating robustness to arbitrary rotations while also requiring less training data to achieve strong results.

FoV-Net: Rotation-Invariant CAD B-rep Learning via Field-of-View Ray Casting

TL;DR

FoV-Net is introduced, the first B-rep learning framework that captures both local surface geometry and global structural context in a rotation-invariant manner, and achieves state-of-the-art performance on B-rep classification and segmentation benchmarks, demonstrating robustness to arbitrary rotations while also requiring less training data to achieve strong results.

Abstract

Learning directly from boundary representations (B-reps) has significantly advanced 3D CAD analysis. However, state-of-the-art B-rep learning methods rely on absolute coordinates and normals to encode global context, making them highly sensitive to rotations. Our experiments reveal that models achieving over 95% accuracy on aligned benchmarks can collapse to as low as 10% under arbitrary rotations. To address this, we introduce FoV-Net, the first B-rep learning framework that captures both local surface geometry and global structural context in a rotation-invariant manner. Each face is represented by a Local Reference Frame (LRF) UV-grid that encodes its local surface geometry, and by Field-of-View (FoV) grids that capture the surrounding 3D context by casting rays and recording intersections with neighboring faces. Lightweight CNNs extract per-face features, which are propagated over the B-rep graph using a graph attention network. FoV-Net achieves state-of-the-art performance on B-rep classification and segmentation benchmarks, demonstrating robustness to arbitrary rotations while also requiring less training data to achieve strong results.
Paper Structure (14 sections, 2 equations, 9 figures, 2 tables)

This paper contains 14 sections, 2 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Field-of-View raycasting. Rays are cast from each face center in both inward and outward hemispheres, recording first-hit statistics to capture the surrounding 3D structure from a face-centric viewpoint. Only ray hits are visualized for clarity.
  • Figure 2: UV-mapping. A 3D surface (left) is parameterized by a 2D UV domain (right), where each $(u, v)$ coordinate maps uniquely to a point on the surface. Rasterizing the UV domain produces a fixed-resolution grid of surface samples.
  • Figure 3: LRF UV construction. (a) Standard UV-grids are defined in the global frame (XYZ). (b) LRF UV grids are defined in the $\mathbf{R}_f$ frame (UVN) and relative to $\mathbf{o}$ making identical faces (colors) yield identical descriptors regardless of pose.
  • Figure 4: Field-of-view sampling. (a) Rays are cast from the face center $\mathbf{o}$ over a hemisphere oriented along the normal $\mathbf{N}$. Azimuth ($0^\circ$) is aligned with the $\mathbf{U}$ direction. (b) The hemisphere is discretized into an elevation $\times$ azimuth grid, forming a 2D descriptor suitable for CNN processing.
  • Figure 5: Outward vs. inward vision. (a) Outward rays (around $\mathbf{N}$) probe the external environment, often resulting in faces that see empty space. (b) Inward rays (around $-\mathbf{N}$) probe the interior of the solid, typically yielding dense intersections. Only intersection hits displayed.
  • ...and 4 more figures