Predictor of 2'-O-Methylation sites among RNAs
2OM is a common post-transcriptional modification, which plays an important role in a variety of biological functions. Understanding its distribution in RNA is helpful to study its mechanism of action. Here, we proposed a model based on Machine learning method to rapidly identify potential 2OM sites in RNA sequences. We first conducted motif analysis on RNA sequences and found that different types of 2OM modifications have common and specific motif regions.
The RNA sequence samples were formulated by short-range correlation feature and physicochemical properties of nucleotides. Two feature selection methods, ANOVA and mRMR, were used to reduce feature dimension and obtain the optimal feature subset. Finally, the Am-SVM, Um-SVM, Gm-XGBoost and Cm-XGBoost prediction models were constructed.
i2OM only requires sequence in FASTA format. You can choose a model according to your requirements for the type of potential 2OM sites, or use a combined forecasting model to forecast all types of potential 2OM sites