Every year, millions of deaths take place worldwide due to gene-related disorders, such as cancer, cardiovascular diseases, Diabetes, and many more. Every year, millions of patients are diagnosed and suffer from these genetic disorders. And every year, millions of dollars are spent trying to find cures, to no avail.
All that was supposed to change after the development of gene-editing technologies. They were supposed to cure disorders like cancer. They were supposed to save millions of lives around the world. And yet, nothing happened. All the problems continue to haunt us today.
Now, I don’t mean to undermine the progress that has been made. We have come far in our understanding of genetics, and how it determines our fate and the development of disorders. We have come far in our understanding of which genes are responsible for what traits. We have come far in our understanding of technologies like CRISPR and how they can be used. Except, we have not come far enough.
One of the huge reasons behind the fact that CRISPR can still not be used to cure and/or treat all these problems is the immense difficulty associated with designing an accurate CRISPR experiment, including the design of the guide RNA (gRNA) sequence.
If you don’t know much about CRISPR, the gRNA sequence specifies what part of the DNA needs to be edited, and guides the CRISPR system to that site.
But, with most gRNA sequences, the system can allow just a little variability when matching the DNA to the gRNA. This means that it might make edits at sights it’s not mean to target, leading to off-target effects.
As you might guess, this can be super harmful as it might make unwanted changes that can harm the organism. To avoid this, scientists use statistical methods to predict the off-target effects for different gRNA sequences and try to develop the best sequence with the least effects.
The problems? One, it takes a lot of time to do this. They have to check many different gRNA sequences until they find one that can be used. Second, the models are not effective or accurate. In fact, different statistical models can give entirely different predictions for the exact same sequence!
So is there a solution? Possibly.
Artificial intelligence has recently been taking the world by storm. You can now find it almost anywhere, with thousands of companies, including notable ones like Google and Facebook, both employing and developing them. It’s been proven time and time again to perform significantly better than humans on so many tasks, especially when it comes to data analysis.
So you know what really pisses me off? The fact that it’s still barely used today in genomics and gene editing. Researchers have only recently begun applying the concepts of machine learning and deep learning in these fields, and already they have seen huge development.
In fact, just yesterday I attended a lecture by Dr. Lila Kari from the University of Waterloo, and she talked about how machine learning helped her solve a problem for species separation that they just couldn’t do with simple mathematical analysis. And when I say “helped”, I mean it took a problem she could not even solve, and then found ways to analyze it for 96% accuracy during testing!
And yet, the application of machine learning in the field remains basically unheard of. We are wasting such valuable resources here, by failing to realize and utilize the true power of machine learning algorithms. If we truly hope to accelerate our development and understanding of CRISPR, it is time we begin using machine learning.
GuideGen — A Step Forward
Which is why I came up with GuideGen, a machine learning and deep learning oriented method for gRNA sequence design. With this project, I hope to put to use the valuable resources we have in an effective manner, to move our research forward at an unprecedented rate.
So, how does it work?
It builds on previous research that has been done into using deep learning processes for off-target prediction. One example of such a study is Off-target predictions in CRISPR-Cas9 gene editing using deep learning, which develops a convolutional neural network (CNN) and a feedforward neural network (FNN) as two possible models to predict off-target effects for specific gRNA sequences. The models were found to perform significantly better than previous attempts at using machine learning and statistical methods of predicting the off-target effect.
GuideGen will use a similar CNN model designed to take in a gRNA sequence, and the entire genome for the organism, and output the probability for an off-target effect.
But that’s not it. Unlike current methods, where researchers design the sequence and check for off-target effects using algorithms or statistics, GuideGen removes the need to repeatedly design gRNA sequences manually. Instead, the program also includes another machine-learning algorithm known as a genetic algorithm, which performs this task.
The algorithm randomly generates a population of gRNA sequences from the target gene and scores them for off-target effects using the deep learning model. Then, in a natural selection based mechanism, the algorithm takes the sequences with the lowest scores (it’s like survival of the fittest) and forms another generation of sequences from the survivor, with just a few changes. Once again, it scores these sequences, and continuously improves the gRNA sequence to obtain the optimal sequence. This method not only cuts down on time but automatically optimizes the gRNA sequence.
So what makes GuideGen so special? Well, it’s not the first method of designing gRNA sequences. But as I mentioned earlier, most methods rely on statistical models, which are just ineffective and inaccurate. GuideGen solves this problem. Machine learning processes are capable of performing tasks with much higher accuracy, because of their ability to identify patterns and connections that humans would never be able to detect. By doing so, machine learning models can determine methods of carrying out tasks that perform much better than the statistical models designed using limited human knowledge and capabilities.
Right now, the program is currently being planned out, and feedback is constantly being received from professors and geneticists working on similar projects. If you want to keep updated with the progress, and want to find out a little more, check out our website!
That’s it for me. This is Akshaj, signing off for now. See you next time.