The non-coding regions of SARS-CoV-2 RNA play a decisive role in viral replication. Kresten Lindorff-Larsen and Sandro Bottaro of the University of Copenhagen have been using molecular dynamics simulations to predict the structure and dynamics of these regions in the hope that this will enable the development of drugs that target them.
Kresten Lindorff-Larsen and his colleague Sandro Bottaro are experts in biomolecular modelling, with the former specialising in proteins and the latter in nucleic acids. Both researchers have spent many years working at the interface between computational modelling and biophysical experimental work. After making contact with a newly-established consortium of experimentalists known as “COVID-19 NMR” that was setting up a large-scale effort to characterise SARS-CoV-2 proteins and its RNA genome, the two researchers saw an opportunity to offer their expertise within a framework which might help contribute towards the development of drugs for COVID-19.
“We were initially hesitant about applying to do a project based on COVID-19 as we knew a lot of people who were better placed to do work that had a high chance of having a real impact on people’s lives,” says Lindorff-Larsen. “Instead, we spent some time thinking about what work we could do that would have the potential to help the situation surrounding the pandemic but also, if nothing were to come out of it in that respect, would still yield some interesting scientific findings.”
Kresten Lindorff-Larsen and Sandro Bottaro
In the COVID-19 NMR group, the two researchers saw an opportunity to use their computational modelling expertise and their experience of working with NMR data to shed some light on the atomic structure of RNA in the SARS-CoV-2 virus. As part of PRACE’s fast-track call to support projects that mitigate the impact of the pandemic, they were awarded 20 000 000 core hours on Joliot-Curie Rome hosted by GENCI at CEA, France, and 352 000 core hours on Marconi100 hosted by CINECA, Italy.
Very little work has been done worldwide to study the RNA structure within the SARS-CoV-2 virus, with most efforts focusing on the protein molecules. RNA molecules are known to be very flexible, meaning that traditional experimental techniques are unable to accurately pin down their structure. Computational work is needed in tandem with experimental work in order to ascertain the detailed structure of RNA molecules.
Standard illustrations of the RNA genome within the SARS-CoV-2 virus show it as a neatly ordered coil, but in reality little is known about its three-dimensional structure. “What we do know is that the RNA interacts with other proteins and that it is flexible,” says Bottaro. “People have also examined the sequence of the RNA in detail. It is 30 000 bases long, most of which codes for proteins. However, two special regions at the very beginning and end, known as the 5’ UTR and 3’ UTR [UTR standing for untranslated regions] do not encode for proteins but are crucial for viral replication, transcription and packaging. Experimentally, it has been shown that mutations that disrupt the stability of these regions affect viral replication, so they are of great interest to COVID-19 research.”
Secondary structure prediction of the 5’ end untranslated region of SARS-CoV-2 RNA genome.
Despite the functional relevance of these untranslated regions detailed information on them is scarce. It is known that they have conserved structural motifs within their inner structure, and so Bottaro and Lindorff-Larsen have been carrying out molecular dynamic simulations using the GROMACS and PLUMED software packages to predict the structure and dynamics of selected structural elements in these non-coding regions.
“The nice thing about these simulations is that you obtain an atomic detail prediction not just of a single structure but of many structures,” says Bottaro. “We can then look at the atomic details of what happens with different structures, so we are really using these simulations as a kind of microscope to examine the RNA in more detail. Then, our colleagues carrying out the biophysical experiments can examine our hypotheses about what is happening, with the long-term goal of providing a structural basis for understanding how the viral replication works and for rational drug design that targets specific elements of these structures.”
From a technical side, these systems are small in terms of the number of particles being simulated. Traditionally, this means that they are difficult to scale well, and so the researchers have had to use a number of tricks and techniques to scale to the larger numbers of cores being used. “We have done multiple replica simulations where we have multiple copies of the same molecule that we can simulate in parallel, and we can then have them speak to one another in ways that allow us to use the parallel infrastructure effectively,” says Lindorff-Larsen. “Sandro has had to develop a new protocol to efficiently find some starting points for our simulations, and this will be useful going forwards outside the boundaries of this project.”
The predicted three-dimensional structure of stem-loop 5a (SL5a) in the 5’ end untranslated region of SARS-CoV-2 RNA genome.
There are two main steps to the protocol being used in the project. The first step, which is relatively computationally cheap, provides a rough approximation of the structure of the molecule, and this step is carried out many times. These structures are then sifted through based on some generic criteria and understanding of the molecule, after which a more detailed exploration of the structures is carried out based on quantitative mapping using force field energy functions. “Our allocation for PRACE was very important for the first part as it allowed us to carry out a huge number of pilot runs very quickly to tune our parameters,” says Bottaro. “On our in-house cluster, this would have taken a year or more, but we have done it in a matter of months.”
Although the COVID-RNA project aims to exploit the synergy between computational work and experimental findings, the researchers have planned out their work so that the two sides can be done independently without having to wait for results from the other side. This allows the computational modellers to plough ahead with their simulations very quickly and then merge their results with those of the experimentalists after the allocation is finished. “When you are doing work on a subject such as COVID-19 where time is of the essence, it is really important that you are able to parallelise the science in this way and then make ends meet afterwards,” says Lindorff-Larsen.
With the PRACE project around halfway through now, much of the analysis of the simulations still needs to be carried out and so it is difficult for the researchers to make any concrete statements about the findings of their work. “It has been an unusual situation for us to have such a large amount of resources on a project that we have only just started, and we’ve made a lot more progress than we would normally because we have been able to explore ideas very quickly,” says Bottaro. “We believe we’ve done some interesting science that stands by itself independently of the pandemic, but at the same time we are very pleased to have been working on the structure of molecules that may enable better drug design for COVID-19.”