Table of Contents
Fetching ...

Cross-Gate MLP with Protein Complex Invariant Embedding is A One-Shot Antibody Designer

Cheng Tan, Zhangyang Gao, Lirong Wu, Jun Xia, Jiangbin Zheng, Xihong Yang, Yue Liu, Bozhen Hu, Stan Z. Li

TL;DR

This work tackles antibody CDR design by addressing insufficient geometric modeling and inefficient iterative inferences. It introduces a two-stage framework that first builds a protein complex invariant embedding (PIE) to capture intra- and inter-component backbone geometry including all backbone atoms, and then applies a cross-gate MLP (CGMLP) for end-to-end sequence-structure co-learning in a single shot. The model outputs complete antibody-antigen complex sequences and structures without iterative decoding and demonstrates superior performance across sequence and structure metrics, CDR-H3 design benchmarks, and binding-affinity optimization, with improved training/inference efficiency. The approach holds promise for rapid, geometry-aware, in silico antibody design, though it remains to be validated experimentally in wet-lab settings.

Abstract

Antibodies are crucial proteins produced by the immune system in response to foreign substances or antigens. The specificity of an antibody is determined by its complementarity-determining regions (CDRs), which are located in the variable domains of the antibody chains and form the antigen-binding site. Previous studies have utilized complex techniques to generate CDRs, but they suffer from inadequate geometric modeling. Moreover, the common iterative refinement strategies lead to an inefficient inference. In this paper, we propose a \textit{simple yet effective} model that can co-design 1D sequences and 3D structures of CDRs in a one-shot manner. To achieve this, we decouple the antibody CDR design problem into two stages: (i) geometric modeling of protein complex structures and (ii) sequence-structure co-learning. We develop a novel macromolecular structure invariant embedding, typically for protein complexes, that captures both intra- and inter-component interactions among the backbone atoms, including C$α$, N, C, and O atoms, to achieve comprehensive geometric modeling. Then, we introduce a simple cross-gate MLP for sequence-structure co-learning, allowing sequence and structure representations to implicitly refine each other. This enables our model to design desired sequences and structures in a one-shot manner. Extensive experiments are conducted to evaluate our results at both the sequence and structure levels, which demonstrate that our model achieves superior performance compared to the state-of-the-art antibody CDR design methods.

Cross-Gate MLP with Protein Complex Invariant Embedding is A One-Shot Antibody Designer

TL;DR

This work tackles antibody CDR design by addressing insufficient geometric modeling and inefficient iterative inferences. It introduces a two-stage framework that first builds a protein complex invariant embedding (PIE) to capture intra- and inter-component backbone geometry including all backbone atoms, and then applies a cross-gate MLP (CGMLP) for end-to-end sequence-structure co-learning in a single shot. The model outputs complete antibody-antigen complex sequences and structures without iterative decoding and demonstrates superior performance across sequence and structure metrics, CDR-H3 design benchmarks, and binding-affinity optimization, with improved training/inference efficiency. The approach holds promise for rapid, geometry-aware, in silico antibody design, though it remains to be validated experimentally in wet-lab settings.

Abstract

Antibodies are crucial proteins produced by the immune system in response to foreign substances or antigens. The specificity of an antibody is determined by its complementarity-determining regions (CDRs), which are located in the variable domains of the antibody chains and form the antigen-binding site. Previous studies have utilized complex techniques to generate CDRs, but they suffer from inadequate geometric modeling. Moreover, the common iterative refinement strategies lead to an inefficient inference. In this paper, we propose a \textit{simple yet effective} model that can co-design 1D sequences and 3D structures of CDRs in a one-shot manner. To achieve this, we decouple the antibody CDR design problem into two stages: (i) geometric modeling of protein complex structures and (ii) sequence-structure co-learning. We develop a novel macromolecular structure invariant embedding, typically for protein complexes, that captures both intra- and inter-component interactions among the backbone atoms, including C, N, C, and O atoms, to achieve comprehensive geometric modeling. Then, we introduce a simple cross-gate MLP for sequence-structure co-learning, allowing sequence and structure representations to implicitly refine each other. This enables our model to design desired sequences and structures in a one-shot manner. Extensive experiments are conducted to evaluate our results at both the sequence and structure levels, which demonstrate that our model achieves superior performance compared to the state-of-the-art antibody CDR design methods.
Paper Structure (21 sections, 14 equations, 7 figures, 5 tables)

This paper contains 21 sections, 14 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The backbone comprised solely of C$\alpha$ atoms provides a reduced amount of information compared to the backbone consisting of all atoms.
  • Figure 2: The schematic diagram of an antibody-antigen complex structure. Note that the antibody is a symmetric Y shape, each half of which contains a heavy and light chain. Here we focus on designing the CDR-H1, CDR-H2, and CDR-H3 loops in the heavy chain.
  • Figure 3: The overall framework of our model. The input is both the sequence and structure of the antibody-antigen complex. The CDRs are visually masked by light yellow blocks to highlight their generation by the model. The output consists of the complete sequence and structure of the antibody-antigen complex, including the generated CDRs.
  • Figure 4: The schematic diagram of intra- and inter-component edges in the antibody-antigen complex.
  • Figure 5: The schematic diagram of the sequence learning with Cross-Gate MLP and structure learning with MLP.
  • ...and 2 more figures