Researchers At Stanford Propose Deep Learning Models For Predicting RNA Degradation Via Dual Crowdsourcing

A quick rollout of mRNA-based vaccinations against severe acute respiratory syndrome coronavirus 2 demonstrates the enormous promise of mRNA-based therapeutics as a modular therapeutic platform, allowing virtually any protein to be delivered and translated (SARS-CoV-2). However, RNA hydrolysis, in particular, is a limiting factor on stability in a lipid nanoparticle (LNP)-based formulations due to RNA’s inherent chemical instability. Hydrolysis during transport and storage reduces the amount of mRNA present in LNP formulations, and hydrolysis in vivo following vaccination injection reduces the amount of protein that can be generated.

Synonymous sequence design is an uncharted avenue to longer-lasting mRNA therapies. The number of mRNA sequences that encodes the SARS-CoV-2 spike protein antigen is 10633, as determined by past computations. Considering the wide variety of potential mRNA sequences for a specific therapeutic target, some of these sequences may have inherent structural properties that render them hydrolysis-resistant compared to conventional mRNA vaccines. Any mRNA design technique, however, is restricted in its usefulness by the precision with which its underlying model predicts RNA decay.

Several researchers, including those from Stanford University, NIVIDA, Kaggle, Eterna Massive Open Laboratory, and others, were interested in determining how much predictive power may be achieved in a very short period for RNA degradation model building. They merged the RNA design platform Kaggle with the machine learning competition platform Eterna.

Meet Hailo-8™: An AI Processor That Uses Computer Vision For Multi-Camera Multi-Person Re-Identification (Sponsored)

RNA design is the process of creating sequences of RNA with desired characteristics, such as a certain overall structure, the desired function (such as sensor activity), or in this case, great chemical stability. The degradation data from short RNA fragments were designed on the Eterna platform, which contained a wide diversity of sequences and structures. The researchers believed that crowdsourcing the problem of obtaining a machine-learning architecture would result in a model capable of expressing the resulting complexity of sequence- and structure-dependent degradation patterns. They also believed that rigorous and independent testing of the developed models would minimize the sharing of assumptions between the people designing the constructs to test (Eterna participants) and the people building the models (Kaggle participants), thereby improving generalizability on independent datasets.

Two blind prediction challenges were then applied to the generated models. The first was within the Kaggle competition itself. The data on RNA structure probing and degradation that participants would attempt to forecast was unavailable until after the competition was announced. However, as it is limited to probing small RNA fragments, it cannot be scaled to evaluate the rate of degradation of full-length mRNAs that code for proteins of interest at the single-nucleotide level. Stabilized RNA-based treatments are designed to minimize the overall degradation rates per mRNA molecule, and other experimental approaches like PERSIST-seq13 have been created to do just that.

Empirical evaluation of the suggested model demonstrated significant agreement between the accumulated per-nucleotide degradation rates and the abundance of the overall construct after sequencing. 

Following the above approach, the generated models were tested in a second blind challenge. This time attempt was to predict the global degradation of full-length mRNAs encoding a range of model proteins, as tested experimentally using PERSIST-seq. Furthermore, the models showed superior predictive power over baseline approaches in estimating these global deterioration rates. For this reason, these models immediately apply to direct the design of low-degradation mRNA molecules. 

Performance analysis of models reveals that the availability of data and the precision of structure prediction techniques used to construct input features are the primary constraints on the prediction of RNA degradation patterns. The team believes that the RNA degradation prediction and treatment design can improve much further when experimental data and secondary structure prediction are integrated with network topologies like those established in this work.


Check out the Paper. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.


Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.


Credit: Source link

Comments are closed.