Artificial Intelligence (AI) plays an increasingly pivotal role in drug discovery. In particular, artificial neural networks such as deep neural networks drive this area of research. The research presented in this thesis is considered a synergistic combination of physicochemical models of protein-ligand interactions such as molecular dynamics simulation, novel machine learning concepts and the use of big data for solving fundamental problems in Structure-Based Drug Design (SBDD). This area of research involves the use of three-dimensional (3D) structural data of biomolecules to assist lead discovery and optimization in a time- and cost-efficient manner.
The main focus of the thesis research is the development of models, algorithms and methods to facilitate binding-mode elucidation, affinity prediction for congeneric series of molecules and flexible docking.
For pose-prediction, we developed a Convolutional Neural Network model incorporating hydration information, named DeepWatsite, which displays accurate binding-mode prediction and the capability to highlight different roles of water molecules in protein-ligand binding. In order to train the neural network model, we created a comprehensive database for hydration information of thousands of protein systems. This was made possible through the development of an efficient GPU-accelerated version of Watsite, a program for generating hydration profiles of protein systems through molecular dynamics simulations. ewline
\indent For accurate affinity prediction for congeneric series of compounds, we developed a new methodological platform for mixed-solvent simulation based on the lambda-dynamics concept. Additionally, we developed a deep-learning model that combines molecular dynamics simulations and a distance-aware graph attention algorithm. Validation studies using this method revealed that its accuracy is competitive to resource-intensive free energy perturbation (FEP) calculations. To train the model, we generated a synthetic database of congeneric series of compounds extracted from the highest-quality medicinal chemistry articles. Molecular-dynamics simulations were used to simulate all the generated systems as method for data augmentation. ewline
\indent For flexible docking, we developed a machine-learning assisted docking strategy that relies on protein-ligand distance matrix predictions. This technique is built upon Weisfeiler-Lehman neural network concept with an attention mechanism. Comprehensive validation on docking and cross-docking datasets demonstrated the potential of this method to become a docking concept with higher accuracy and efficiency than existing state-of-the-art flexible docking techniques.
In summary, the thesis proved the general applicability of deep-learning to various tasks in SBDD. Furthermore, it demonstrates that treating biomolecules as dynamic entities can improve the quality of computational methods in structure-based drug design.