Table of Contents
Fetching ...

AtomSurf : Surface Representation for Learning on Protein Structures

Vincent Mallet, Souhaib Attaiki, Yangyang Miao, Bruno Correia, Maks Ovsjanikov

TL;DR

This work systematically investigates surface-based learning for protein structures and demonstrates that while pure surface encoders can be competitive in isolation, they fall short of state-of-the-art benchmarks compared to graph-based methods. The authors adapt DiffusionNet to proteins and introduce AtomSurf, a hybrid architecture that enables node-level feature sharing between surface and graph representations via a bipartite graph, achieving state-of-the-art results on the Atom3D benchmark and strong performance on binding-site tasks. Key innovations include scale-aware diffusion, coarsened meshes for efficiency, and comprehensive ablations showing that integrated representations outperform single modalities across diverse tasks. The approach advances protein learning by leveraging complementary priors from multiple representations and highlights practical trade-offs between accuracy and memory usage, with implications for robust, multi-modal structural biology tools.

Abstract

While there has been significant progress in evaluating and comparing different representations for learning on protein data, the role of surface-based learning approaches remains not well-understood. In particular, there is a lack of direct and fair benchmark comparison between the best available surface-based learning methods against alternative representations such as graphs. Moreover, the few existing surface-based approaches either use surface information in isolation or, at best, perform global pooling between surface and graph-based architectures. In this work, we fill this gap by first adapting a state-of-the-art surface encoder for protein learning tasks. We then perform a direct and fair comparison of the resulting method against alternative approaches within the Atom3D benchmark, highlighting the limitations of pure surface-based learning. Finally, we propose an integrated approach, which allows learned feature sharing between graphs and surface representations on the level of nodes and vertices across all layers. We demonstrate that the resulting architecture achieves state-of-the-art results on all tasks in the Atom3D benchmark, while adhering to the strict benchmark protocol, as well as more broadly on binding site identification and binding pocket classification. Furthermore, we use coarsened surfaces and optimize our approach for efficiency, making our tool competitive in training and inference time with existing techniques. Code can be found online: https://github.com/Vincentx15/atomsurf

AtomSurf : Surface Representation for Learning on Protein Structures

TL;DR

This work systematically investigates surface-based learning for protein structures and demonstrates that while pure surface encoders can be competitive in isolation, they fall short of state-of-the-art benchmarks compared to graph-based methods. The authors adapt DiffusionNet to proteins and introduce AtomSurf, a hybrid architecture that enables node-level feature sharing between surface and graph representations via a bipartite graph, achieving state-of-the-art results on the Atom3D benchmark and strong performance on binding-site tasks. Key innovations include scale-aware diffusion, coarsened meshes for efficiency, and comprehensive ablations showing that integrated representations outperform single modalities across diverse tasks. The approach advances protein learning by leveraging complementary priors from multiple representations and highlights practical trade-offs between accuracy and memory usage, with implications for robust, multi-modal structural biology tools.

Abstract

While there has been significant progress in evaluating and comparing different representations for learning on protein data, the role of surface-based learning approaches remains not well-understood. In particular, there is a lack of direct and fair benchmark comparison between the best available surface-based learning methods against alternative representations such as graphs. Moreover, the few existing surface-based approaches either use surface information in isolation or, at best, perform global pooling between surface and graph-based architectures. In this work, we fill this gap by first adapting a state-of-the-art surface encoder for protein learning tasks. We then perform a direct and fair comparison of the resulting method against alternative approaches within the Atom3D benchmark, highlighting the limitations of pure surface-based learning. Finally, we propose an integrated approach, which allows learned feature sharing between graphs and surface representations on the level of nodes and vertices across all layers. We demonstrate that the resulting architecture achieves state-of-the-art results on all tasks in the Atom3D benchmark, while adhering to the strict benchmark protocol, as well as more broadly on binding site identification and binding pocket classification. Furthermore, we use coarsened surfaces and optimize our approach for efficiency, making our tool competitive in training and inference time with existing techniques. Code can be found online: https://github.com/Vincentx15/atomsurf
Paper Structure (48 sections, 1 theorem, 7 equations, 12 figures, 8 tables)

This paper contains 48 sections, 1 theorem, 7 equations, 12 figures, 8 tables.

Key Result

Proposition 3.1

Let $X$ be a shape and $Y = \alpha X$ its scaled version by a factor $\alpha>0$. Denoting by $E_{\cdot}(t,x)$ the expected geodesic distance for a Brownian motion starting from point $x$ after time $t$, it holds that: $E_{Y}(t,x) = \alpha E_{X}\left(\frac{t}{\alpha^2},x\right).$

Figures (12)

  • Figure 1: Illustration of our approach integrating surface and graph information. We ensure joint learning across the two representations and enable information propagation across all layers of the network. Our information sharing is based on the spatial proximity relations between individual graph nodes and surface vertices (not shown here).
  • Figure 1: Diverse mathematical objects used to represent a protein structure, sequences, molecular surfaces (blue), atom-level and residue-level point clouds (red) and graphs (green). Effective machine learning for protein structures hinges on selecting the appropriate mathematical representation along by a compatible machine-learning technique.
  • Figure 2: Learning curve on the RNA segmentation using the original and our enhanced DiffusionNet models.
  • Figure 2: Visualization of a Dirac delta function $\delta_x$ at a point, diffused for several diffusion times.
  • Figure 3: Histogram of the diffusion times obtained after training.
  • ...and 7 more figures

Theorems & Definitions (2)

  • Proposition 3.1
  • proof