De novo Protein Design Using Geometric Vector Field Networks

*Weian Mao 1,2, *Muzhi Zhu 1, * Zheng Sun3, Shuaike Shen 1, Lin Yuanbo Wu3, Hao Chen1, Chunhua Shen1,4
ICLR'2024 Spotlight
1Zhejiang University, 2The University of Adelaide, 3Swansea University, 4Ant Group
Corresponding Author
*Denote equal contribution

Abstract

Advances like protein diffusion have marked revolutionary progress in de novo protein design, a central topic in life science. These methods typically depend on protein structure encoders to model residue backbone frames, where atoms do not exist. Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context. Only a few basic encoders, like IPA, have been proposed for this scenario, exposing the frame modeling as a bottleneck. In this work, we introduce the Vector Field Network (VFN), that enables network layers to perform learnable vector computations between coordinates of frame-anchored virtual atoms, thus achieving a higher capability for modeling frames. The vector computation operates in a manner similar to a linear layer, with each input channel receiving 3D virtual atom coordinates instead of scalar values. The multiple feature vectors output by the vector computation are then used to update the residue representations and virtual atom coordinates via attention aggregation. Remarkably, VFN also excels in modeling both frames and atoms, as the real atoms can be treated as the virtual atoms for modeling, positioning VFN as a potential universal encoder . In protein diffusion (frame modeling), VFN exhibits a impressive performance advantage over IPA, excelling in terms of both designability (67.04% vs. 53.58%) and diversity (66.54% vs. 51.98%). In inverse folding(frame and atom modeling), VFN outperforms the previous SoTA model, PiFold (54.7% vs. 51.66%), on sequence recovery rate; we also propose a method of equipping VFN with the ESM model, which significantly surpasses the previous ESM-based SoTA (62.67% vs. 55.65%), LM-Design, by a substantial margin.

VFN-Diff

FrameDiff

Pipeline

Vector Field Operator

Figure 2: Pipeline for the Vector Field Operator. A) Transforming the virtual atomic coordinates Qj from frame Tj to frame Ti to obtain Kj . B) An example of vector computation involving vectors Qi and Kj using learnable weights wa and wb as defined in Equation equation 2. When wa and wb are specific weights (as shown in figure), the vector field can yield the Euclidean vector, h1 and h2, between two particular atoms.


Evaluation

We use ProteinMPNN to acquire the predicted sequences corresponding to the designed structures with multiple motifs. Subsequently, we use ESMFold to fold the sequences into structures and align them with the designed structures to calculate the scTM, which is a metric to evaluate the designability.

BibTeX

@article{mao2023novo,
          title={De novo protein design using geometric vector field networks},
          author={Mao, Weian and Zhu, Muzhi and Sun, Zheng and Shen, Shuaike and Wu, Lin Yuanbo and Chen, Hao and Shen, Chunhua},
          journal={arXiv preprint arXiv:2310.11802},
          year={2023}
        }