PremPS evaluates the effects of single mutations on protein stability by calculating the quantitative changes in unfolding Gibbs free energy. The predictions are based on the protein structure.
The PremPS model uses random forest (RF) regression scoring function, training on experimental data of unfolding Gibbs free energy changes (ΔΔG) for 5296 mutations from 131 proteins. In order to prepare a more balanced dataset and improve the predictive performance for both destabilizing and stabilizing mutations, the reversed mutations were also incorporated into the training dataset. For the forward mutations (ΔΔGwt→mut), 3D structures of the wild-type proteins were obtained from the Protein Data Bank (PDB) (1). For the reverse mutations (ΔΔGmut→wt), the 3D structures of mutants were produced with BuildModel module of FoldX (2) using wild-type protein structures as the templates.
The PremPS energy function includes ten evolutionary and structure-based features belonging to six categories and the contribution of each category of features is shown in the table and described below:
1. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Research, 28, 235-242.
2. Guerois, R., Nielsen, J.E. and Serrano, L. (2002) Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol, 320, 369-387.
3. Bhagwat, M. and Aravind, L. (2007) PSI-BLAST tutorial. Methods in molecular biology (Clifton, N.J.), 395, 177-186.
4. Choi, Y., Sims, G.E., Murphy, S., Miller, J.R. and Chan, A.P. (2012) Predicting the functional effect of amino acid substitutions and indels. PloS one, 7, e46688.
5. Sweet, R.M. and Eisenberg, D. (1983) Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. Journal of Molecular Biology, 171, 479-488.
6. Joosten, R.P., te Beek, T.A., Krieger, E., Hekkelman, M.L., Hooft, R.W., Schneider, R., Sander, C. and Vriend, G. (2011) A series of PDB related databases for everyday needs. Nucleic Acids Res, 39, D411-419.
7. Rose, G.D., Geselowitz, A.R., Lesser, G.J., Lee, R.H. and Zehfus, M.H. (1985) Hydrophobicity of amino acid residues in globular proteins. Science (New York, N.Y.), 229, 834-838.
8. Hou, Q., Kwasigroch, J.M., Rooman, M. and Pucci, F. (2019) Solart: A Structure-Based Method To Predict Protein Solubility And Aggregation. bioRxiv, 600734.
9. Yang, Y., Urolagin, S., Niroula, A., Ding, X., Shen, B. and Vihinen, M. (2018) PON-tstab: Protein Variant Stability Predictor. Importance of Training Data Quality. International Journal of Molecular Sciences, 19, 1009.
More details can be found in our paper.