Providing accurate details about drug targets on the SARS-CoV-2 virus is a crucial step towards finding effective treatments for the disease. Professor Jean-Philip Piquemal of Sorbonne University has been using newly-developed codes to explore the conformational spaces of two of the main drug targets on the virus, and in doing so has carried out some of the longest ever simulations of their kind.
Jean-Philip Piquemal, a professor of theoretical chemistry at Sorbonne University in Paris, is the creator of a highly efficient code for molecular simulations called Tinker HP.
Designed to scale on supercomputers, it uses state-of-the-art simulation methods to model biological systems. The code uses what are known as polarisable force fields, providing a better approximation of many-body physics than its predecessors. The accuracy that the code provides does, however, come at the cost of using around ten times the amount of computational power than other comparable codes. For this reason, Piquemal and his group sought out the HPC resources being offered by PRACE’s fast-track call earlier this year.
With the onset of the COVID-19 pandemic, Piquemal and his team decided to apply their unique methods to investigate the conformational spaces of common drug targets on the SARS-CoV-2 virus. “When you are developing a drug, there are always two things to consider: the drug itself, and the protein it is targeting,” he explains.
“Proteins generally have many different shapes that they can take, and if you don’t have this information then drug discovery becomes difficult. However, calculating every possible one of these conformations requires huge amounts of computing time.”
Back in March, as lockdowns were becoming a reality for large swathes of Europe, Piquemal and his team set about developing a new algorithm that could be used to tackle this problem. The algorithm uses adaptive sampling techniques, similar to those used in the famous Folding@home distributed computing project that harnesses home PCs, games consoles and other sources of processing power to carry out protein dynamics simulations.
“Using the hundreds of GPUs provided to us by our allocation from PRACE, we have been lucky enough to have access to a huge amount of computing power to throw at the problems we are trying to solve,” says Piquemal. “Our first task was investigating an important protein on the virus called the main protease. Comparing our results with those gathered by groups working in Japan and the USA with much bigger computing resources, we demonstrated that our code was able to give even more accurate results with the GPUs.”
Piquemal’s simulations provide extremely accurate details about their targets, for instance allowing the researchers to map cryptic pockets – sites on proteins that, although normally invisible, represent good targets for binding drugs to. “These simulations require a lot of computational power but have the potential to be really important due to their ability to make accurate predictions,” he explains.
“Comparing our work with experimental work, we can see it mirrors everything that has been shown experimentally, but also goes one step further and makes additional predictions. These are now in turn being validated by our experimental collaborators.”
The main protease consists of 100 000 atoms – not a trivial size for a simulation by any means. Piquemal’s next target, though, dwarfs it.
The spike protein, at 1.5 million atoms, is the huge protein that enables SARS-CoV-2 to enter our cells. Its importance to the viral lifecycle makes it an obvious target for drugs, and Piquemal has had to combine the resources received from PRACE with other French computing resources in order to fully model it.
Representation of the pockets’ location on the 6LU7 SARS–CoV–2 protease structure as obtained from the Tinker-HP simulation using the polarizable AMOEBA force field.
“The simulation we have done is, for this class of force field, the longest simulation ever done!” says Piquemal. “Standard simulations show just a few nanoseconds worth of dynamics, but we have been able to push this to the next level and simulate in the order of microseconds. We are very happy with this achievement, and believe it provides a glimpse of the kind of computations we can expect to be happening routinely a few years from now.”
With an issue such as COVID-19 that affects the whole world, it is good to hear that Piquemal very much works around a philosophy of open science. The data has been published on the BioExcel community platform for COVID-19, the papers are all being published open access, and the code itself is freely available on GitHub. “Our hope is that our work will help people to develop drugs to aid the fight against the pandemic,” says Piquemal.
“Beyond that, though, the code can be used for many other applications,” he continues. “A lot of people use it for investigating ionic liquids and various types of nanoparticles. I think that the push in technology that has come with the pandemic is probably going to benefit scientists in lots of ways that we can’t imagine right now, so we are happy that our code is freely available for academics to use for whatever purposes they can think of.”
This article was also published in PRACE Digest 2020.
20 million core hours on Joliot-Curie Rome hosted by GENCI at CEA, France
T.J. Inizan, F. Célerse, O. Adjoua, D. El Ahdab, L-H. Jolly, C. Liu, P. Ren, M. Montes, N. Lagarde, L. Lagardère. P. Monmarché and J-P. Piquemal. High–Resolution Mining of SARS–CoV–2 Main Pro- tease Conformational Space: Supercomputer–Driven Unsupervised Adaptive Sampling. ChemRxiv. Preprint. (2020)