
Citation: Pingping Sun, Yongbing Chen, Bo Liu, Yanxin Gao, Ye Han, Fei He, Jinchao Ji. DeepMRMP: A new predictor for multiple types of RNA modification sites using deep learning[J]. Mathematical Biosciences and Engineering, 2019, 16(6): 6231-6241. doi: 10.3934/mbe.2019310
[1] | Honglei Wang, Wenliang Zeng, Xiaoling Huang, Zhaoyang Liu, Yanjing Sun, Lin Zhang . MTTLm6A: A multi-task transfer learning approach for base-resolution mRNA m6A site prediction based on an improved transformer. Mathematical Biosciences and Engineering, 2024, 21(1): 272-299. doi: 10.3934/mbe.2024013 |
[2] | Jianhua Jia, Lulu Qin, Rufeng Lei . DGA-5mC: A 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method. Mathematical Biosciences and Engineering, 2023, 20(6): 9759-9780. doi: 10.3934/mbe.2023428 |
[3] | Mingshuai Chen, Xin Zhang, Ying Ju, Qing Liu, Yijie Ding . iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. Mathematical Biosciences and Engineering, 2022, 19(12): 13829-13850. doi: 10.3934/mbe.2022644 |
[4] | Xiao Chu, Weiqing Wang, Zhaoyun Sun, Feichao Bao, Liang Feng . An N6-methyladenosine and target genes-based study on subtypes and prognosis of lung adenocarcinoma. Mathematical Biosciences and Engineering, 2022, 19(1): 253-270. doi: 10.3934/mbe.2022013 |
[5] | Sakorn Mekruksavanich, Anuchit Jitpattanakul . RNN-based deep learning for physical activity recognition using smartwatch sensors: A case study of simple and complex activity recognition. Mathematical Biosciences and Engineering, 2022, 19(6): 5671-5698. doi: 10.3934/mbe.2022265 |
[6] | Sungwon Kim, Meysam Alizamir, Youngmin Seo, Salim Heddam, Il-Moon Chung, Young-Oh Kim, Ozgur Kisi, Vijay P. Singh . Estimating the incubated river water quality indicator based on machine learning and deep learning paradigms: BOD5 Prediction. Mathematical Biosciences and Engineering, 2022, 19(12): 12744-12773. doi: 10.3934/mbe.2022595 |
[7] | Sakorn Mekruksavanich, Wikanda Phaphan, Anuchit Jitpattanakul . Epileptic seizure detection in EEG signals via an enhanced hybrid CNN with an integrated attention mechanism. Mathematical Biosciences and Engineering, 2025, 22(1): 73-105. doi: 10.3934/mbe.2025004 |
[8] | Xinyan Ma, Yunyun Liang, Shengli Zhang . iAVPs-ResBi: Identifying antiviral peptides by using deep residual network and bidirectional gated recurrent unit. Mathematical Biosciences and Engineering, 2023, 20(12): 21563-21587. doi: 10.3934/mbe.2023954 |
[9] | Huawei Jiang, Tao Guo, Zhen Yang, Like Zhao . Deep reinforcement learning algorithm for solving material emergency dispatching problem. Mathematical Biosciences and Engineering, 2022, 19(11): 10864-10881. doi: 10.3934/mbe.2022508 |
[10] | Jianhua Jia, Mingwei Sun, Genqiang Wu, Wangren Qiu . DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet. Mathematical Biosciences and Engineering, 2023, 20(2): 2815-2830. doi: 10.3934/mbe.2023132 |
Post-transcriptional modification of RNA plays a crucial role in understanding a variety of cellular processes, such as RNA splicing, RNA degradation, protein translation, stability and immune tolerance [1]. Among different types of RNA modifications, m1A modification is related to gene mutation and helps to maintain the stability of mitochondrial tRNA. Pseudouridine modification is critical to the stabilization of tRNA structure, and the splice RNA responsible for gene regulation [2,3,4,5]. M5C affects RNA structural stability and translation efficiency [6,7]. N6-methyladenosine (m6A) modification involves a variety of important biological processes such as RNA localization and degradation, RNA structural dynamics, cell differentiation and reprogramming [8,9,10,11]. The chemical structures of m1A, pseudouridine and m5C modifications are shown in Figure 1 [3,12,13]. Because RNA modifications occur on specific nucleotides with functional group changes, to detect RNA modification sites via biological experiments would require much time, money and efforts. Among all RNA modification data, m6A together with the three types are the most accessible data.
As an alternative way, computational tools have been published for RNA modification site prediction since 2016. They all combined handcrafted features from RNA sequence analysis with traditional machine learning methods for prediction. Chen et al. developed a m1A prediction tool called RAMPred based on RNA Chemical Properties (CPs) and Support Vector Machines (SVM) [14].Combined on CPs, Nucleotide chemical (NC) and SVM, Wei Chen et al. released a pseudouridine prediction tool called iRNA-PseU [15]. He et al. utilized Dinucleotide Composition (DC), NC, Position-Specific Dinucleotide Preferences (PSDP), Position-Specific Nucleotide Preferences (PSNP), Pseudouridine Synthase (PUS) and SVM to build a pseudouridine prediction tool called PseUI in 2018 [16]. Qiu et al. presented a m5C prediction tool called iRNAm5C-PseDNC by integrating PseDNC, DC and Random Forest (RF) in 2017 [17]. Li encoded RNA sequences to one-hot vectors and used RF to implement m5C prediction [18]. These existing RNA modification site predictors just focused on a single type of RNA modification site prediction. Meanwhile some researchers attempted to develop prediction tools for multiple types of modification sites. Feng et al. published iRNA-PseColl tool to predict methylation of m6A, m1A and m5C [19]. Chen et al worked out a m6A, m1A, adenosine to inosine methylation predictor by CP, NC and SVM in 2018 [20]. These multiple types of RNA modification site of predictors can provide more comprehensive knowledge than single type of predictors. In addition, the performance of traditional machine learning methods highly depends on the effectiveness of feature engineering. However, it is difficult to tell the most relevant feature combinations for specific RNA modification. Deep learning can skip handcrafted features and conduct end-to-end prediction. Through multiple neuron layers and activate functions, it obtains the capacity of mapping from raw input to latent representation, which is trained by labeled data. Under such data-driven model, deep features of RNA sequence related to the semantic information of RNA modification sites would come out. Huang et al. have proved that deep learning has better effects on predicting m6A RNA modification sites [21]. As the preliminary attempt of deep learning in multiple types of RNA modification site prediction, we developed a model for predicting m1A, pseudouridine and m5C RNA modification sites. To the best of knowledge, this is the first deep learning-based tool for multiple types of RNA modification site prediction. Thanks to the larger scale data in m6A type, some researchers have presented deep learning-based predictors to identify this type of RNA modification sites, and achieved excellent improvement.Yet the available data of the three types of RNA modification are still too small to build a deep network independently. In this work, we took advantages of large-scale m6A data to pretrain a deep learning model, and then employed transfer learning strategy to fine-tune its network parameters for our targeted types of RNA modifications. Further, a multi-type RNA modification predictor on three species of H. sapiens, M. musculus and S. cerevisiae named DeepMRMP was carried out.
In this study, all positive samples were extracted from the RMABase 2.0 database [22], which contains ~1373000 N6-Methyladenosines, ~5400 N1-Methyladenosines, ~9600 pseudouridine modifications, ~1000 5-methylcytosine modifications, and other types from 13 species [23]. We randomly retrieved m1A, pseudouridine and m5C data on three species, H. sapiens, M. musculus and S. cerevisiae as positive samples and 10 times the number of other RNA gene fragments as negative samples. The details of our experimental data is shown in Table 1.
Modification type | H. sapiens | M. musculus | S. cerevisiae | Total |
m1A | 2574 | 1052 | 1220 | 4819 |
pseudouridine | 4128 | 3320 | 2122 | 9570 |
m5C | 680 | 97 | 211 | 988 |
For fair comparisons, we cut each RNA sequence into a length of 41 RNA fragment, which is the most adopted length in existing tools. Taking m1A as an example, each RNA fragment in these datasets can be represented as follows:
R=N1N2N3⋯N20XN22⋯N39N40N41 | (1) |
In which the center X is the targeted site, i.e. A (adenine) for m1A, U (Uracil) for pseudouridine, and C (cytosine) for m5C respectively. N1 to N20 represent the upstreaming flank nucleotides towards target site while N22 to N41 denote its down-streaming flank nucleotides.
In order to validate the generalization of our model, we divided our dataset into 10 folds by random selection. Each fold included training set validation set and testing set with the ratio of 3:1:1. Furthermore, for the purpose of avoiding over-estimation, each fold data was processed by the CD-HIT2D-EST tool to remove sequences with high similarity [24,25]. Here we adopted the most stringent threshold 0.8 supporting in CD-HIT2D-EST.
One-hot encoding is one of the most common and effective encoding ways in sequence analysis [26,27,28], which projects each sequence to a single vector at Euclidean space. In our work, each RNA sequence was encoded into one-hot vector for further GRU network modeling. In our one-hot encoding, each nucleotide in RNA fragment can be encoded into a four-dimensional matrix such as A = [1,0,0,0], C = [0,1,0,0], G = [0,0,1,0], U = [0,0,0,1].
Recurrent Neural Network (RNN) is a deep architecture in capable of memorizing contextual information, which is ideal for biological sequence analysis [29,30]. GRU as a lite version of RNN showed its effectiveness in predicting m6A modification sites [21]. The bidirectional version of GRU extracts sequence embedding representation from sequences to capture the potential motifs around the modification sites [31]. In our study, we stacked two bidirectional GRU layer with a unit size of 64. Following BGRU layers, we added a dense layer with 64 units to fully connect all latent representations. The activation function of all layers is Relu, which is in capable of generating sparse output and accelerating converge [31]. Adam optimizer and 5e-4 learning rate were employed in training procedure [32]. The training procedure would stop if the model kept stable during continuous 20 epochs. The details of our deep network can be found in Table 2.
Layer | Hyper-parameters | ||
Activation function | units | Dropout | |
GRU | Relu | 64 | 0.2 |
GRU | Relu | 64 | 0.2 |
Dense | Relu | 64 | 0.2 |
Dense | Softmax | 2 | 0 |
Large scale data is required to understand the latent patterns in modeling a deep network [33]. For the situation of relative small data, transfer learning is a promising strategy to span the data gap. It delivers the knowledge from the source domain to the target domain by relaxing the assumption that the training data and the test data must be independent and identically distributed [34,35]. We hypothesized some potential motifs distributed cross different types of RNA modification sites, so that we chose to use m6A data for per-training to detect such general sequence motif patterns, and then fine-tuned the deep model by the m1A, pseudouridine, m5C methylation data for the corresponding predictors. When fine-tuning the trained model, we set the learning rate to 5e-5, but increased the patient parameter in earlystop operation In doing so, we can make full use of relative small scale data of m1A, pseudouridine and m5C to generate their specific deep features on the basis of the general pretrained model.
In recent studies, four evaluation parameters, Accuracy (Acc), Sensitivity (Sn), Specificity (Sp), and the Matthews correlation coefficient (MCC) have been frequently used to measure the predictor's quality. In this study we also used ROC (receiver operating characteristic) curve, PR (precision-recall) curve and F1 score, which are less affected by the unbalanced data set, to evaluate the performance of predictors. ROC curve reflects the overall relationship between sensitivity and specificity when different thresholds are applied. PRC curve and F1 score reflects the overall relationship between precision and recall.
{Accuracy=TP+TNTP+TN+FP+FNSn(Recall)=TPTP+FNSp=TNFP+TNMCC=TP∗TN−FP∗FN√(TP+FP)(TP+FN)(TN+FP)(TN+FN)Precision=TPTP+FPF1score=2∗Precision∗RecallPrecision+Recall | (2) |
where TP, TN, FP and FN represent the number of true positive, true negative, false positive and false negative samples, respectively. The larger the area under the AUC and PRC curve, the higher the prediction performance.
Moreover, we used the independent dataset measure the predictive performance of the predictor. The procedure of this validation method is briefly described as follows. First, we train our model by selecting a previously partitioned set of training and validation. This process is repeated 10 times, with each of the 10 subsets used exactly once as the validation data. Last, the 10 results are averaged to obtain a final prediction estimation.
To measure the effectiveness of the underlying transfer learning, we compared the performance with and without transfer learning. For fair comparison, all classifiers were used under equal conditions, modeling with the same dataset and feature extraction method. Algorithm performance is presented in Figure 2. As shown in Figure 2, when the transfer learning was used, we found that the performance improved significantly. Therefore, transfer learning is used in our predictive model.
WebLogo is a commonly used sequence feature analysis tool [36,37]. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position.
The modification site of m6A, m1A, pseudouridine and m5C is at A, A, T and C. When we drew, the fixed position is removed to magnify the surrounding features. After truncation, the first 10 sites of the x-axis are the sequences before the modification sites, and the 11–20 sites of the x-axis are the sequences after the modification sites. As seen in Figure 3, the characteristics of the m6A data used in the pre-training are obvious in the ninth to twelfth. The features of m1A and the pseudouridine datasets are also concentrated in the ninth to twelfth positions, which is consistent with the assumption of transfer learning algorithm. For the case that the WebLogo map of m5C is distributed uniformly, we have two assumptions: (1) The motif of m5C are relatively dispersed. (2) 988 m5C data are insufficient to extract the features of m5C. After experimentation, we found that the AUC of m5C model increased by 0.16 after using transfer learning. The results of this experiment show that the features of m5C are similar to those of the other three modifications. It is difficult to effectively identify features with 988 m5C data using WebLogo.
To measure the effectiveness of the underlying GRU network, we compared its performance with other two commonly used deep learning algorithms, such as CNN network and CNN and network that use both CNN and GRU. For fair comparison, all classifiers were used under equal conditions, modeling with the same dataset and feature extraction method. Three algorithms performances are presented in Figure 4.
In order to compare the ROC and PRC curves of each of the three RNA modification prediction models more clearly, we enlarged the image of the m1A data. As shown in Figure 4, we have a good performance in m1A dataset with all three models (AUC = 0.99 and PRC = 0.99). The GRU model achieved a batter performance in ROC and PRC curve compared with the CNN model and CNN-GUR hybrid model. The CNN model has a better effect than the GRU and CNN-GRU hybrid model in m5C dataset. The AUC and PRC of the CNN network, the GRU network and the CNN-GRU hybrid network were 0.79, 0.73, 0.71 and 0.75, 0.72, 0.71 respectively. CNN works better with fewer samples. But when the sample is sufficient, the GRU performance is higher. According to our analysis, CNN networks perform better in small samples because of their simple structure. Due to the large number of memory units, GRU networks need more samples to achieve better performance. The CNN-GRU hybrid model requires the largest number of samples. With the accumulation of m5C samples, our model will become more and more reliable.
In order to further prove its superiority, the predictive results of the proposed method were also compared with the prediction results of the classifiers released in 2018, i.e., iRNA-3typeA, PseUI and RNAm5Cfinder. Table 3 shows the predictive performance of our tool and the performance of the three tools mentioned above when using the same independent test set.
Types | tools | Acc | precision | recall | SP | F1 score | MCC |
M1A | iRNA-3typeA[20] | 0.5119 | 0.5060 | 0.9979 | 0.0258 | 0.6715 | 0.1012 |
DeepMRMP | 0.9927 | 0.9887 | 0.9969 | 0.9886 | 0.9928 | 0.9856 | |
pseudouridine | PseUI[16] | 0.6018 | 0.5989 | 0.6165 | 0.5872 | 0.6076 | 0.2038 |
DeepMRMP | 0.6264 | 0.6675 | 0.5036 | 0.7492 | 0.5741 | 0.2608 | |
M5C | RNAm5Cfinder[18] | 0.6326 | 0.7954 | 0.3571 | 0.9081 | 0.4929 | 0.3179 |
DeepMRMP | 0.6632 | 0.7580 | 0.4795 | 0.8469 | 0.5874 | 0.3510 |
As seen in Table 3, among the two m1A prediction tools, the DeepMRMP outperforms the m1A predictor in iRNA-3typeA. To be specific, the Acc, precision, recall, Sp, F1 score and MCC of the DeepMRMP are 0.9927, 0.9887, 0.9969, 0.9886, 0.9928 and 0.9856, respectively. All the metrics are higher than m1A predictor in iRNA-3typeA. When compared with the PseUI, our model showed improvements of 0.0246 of the Acc, 0.0686 of the precision, 0.1620 of the Sp and 0.057 of the MCC on the independent test sets. Our model has four metrics (Acc, recall, F1 score and MCC) higher than RNAm5Cfinder. A higher F1 score and MCC mean that our model is better, and higher recall means that our predicted results contain more positive samples. Our model allows researchers to pre-screen more likely samples before biological experiments, thus saving manpower and material resources.
In this study, we proposed a model, DeepMRMP, for accurately and efficiently identifying m1A, pseudouridine and m5C sites in RNA sequences. We compared our model DeepMRMP with the latest m1A, pseudouridine and m5C site prediction model by using independent tests. The results showed that our predictor performed strong robustness and generalization than those of other predictors. Further comparative experiments also exhibited that the outperformance might benefit from our deep network and transfer learning strategy. We believe that DeepMRMP has great potentials and with more data become available, the performance of DeepMRMP could be further improved. The source code of DeepMRMP is available at https://github.com/Chenyb939/DeepMRMP
This work is partially supported by National Natural Science Funds of China (Grant No. 61802057), The science and technology research project of "13th Five-Year" of the Education Department of Jilin province under Grant No. JJKH20190290KJ, the scientific research foundation of Jilin Agricultural University and the China Scholarship Council to Fei He.
The authors declare no conflict of interest.
[1] | S. Dunin-Horkawicz, A. Czerwoniec, M. J. Gajda, et al., MODOMICS: A database of RNA modification pathways, Nucleic Acids Res., 34(2006), D145–D149. |
[2] | J. H. Ge and Y. T. Yu, RNA pseudouridylation: New insights into an old modification, Trends Biochem. Sci., 38(2013), 210–218. 2. J. H. Ge and Y. T. Yu, RNA pseudouridylation: New insights into an old modification, Trends Biochem. Sci., 38(2013), 210–218. |
[3] | M. Charette and M. W. Gray, Pseudouridine in RNA: what, where, how, and why, IUBMB Life, 49(2010), 341–351. 3. M. Charette and M. W. Gray, Pseudouridine in RNA: what, where, how, and why, IUBMB Life, 49(2010), 341–351. |
[4] | D. R. Davis, C. A. Veltri, L. J. J. o. B. S. Nielsen, et al., An RNA model system for investigation of pseudouridine stabilization of the codon-anticodon interaction in tRNALys, tRNAHis and tRNATyr, J. Biomol. Struct. Dyn., 15(1998), 1121–1132. |
[5] | A. Basak and C. Query, A pseudouridine residue in the spliceosome core is part of the filamentous growth program in yeast, Cell Reports, 8(2014), 966–973. |
[6] | X. Yang, Y. Yang, B. F. Sun, et al., 5-methylcytosine promotes mRNA export-NSUN2 as the methyltransferase and ALYREF as an m5C reader, Cell Res., 27(2017), 606–625. |
[7] | M. Frye and F. M. Watt, The RNA methyltransferase Misu (NSun2) mediates Myc-induced proliferation and is upregulated in tumors, Curr. Biol., 16(2006), 971–981. |
[8] | X. Wang, Z. Lu, A. Gomez, et al., N6-methyladenosine-dependent regulation of messenger RNA stability, Nature, 505(2014), 117–120. |
[9] | C. Roost, S. R. Lynch, P. J. Batista, et al., Structure and thermodynamics of N6-methyladenosine in RNA: A spring-loaded base modification, J. Am. Chem. Soc., 137(2015), 2107–2115. |
[10] | T. Chen, Y. J. Hao, Y. Zhang, et al., m6A RNA methylation is regulated by micrornas and promotes reprogramming to pluripotency, Cell Stem Cell, 16(2015), 289–301. |
[11] | S. Geula, S. Moshitch-Moshkovitz, D. Dominissini, et al., m6A mRNA methylation facilitates resolution of naive pluripotency toward differentiation, Science, 347(2015), 1002–1006. |
[12] | X. Li, X. Xiong, K. Wang, et al., Transcriptome-wide mapping reveals reversible and dynamic N1-methyladenosine methylome, Nat. Chem. Biol., 12(2016), 311. |
[13] | S. Nachtergaele and C. J. R. B. He, The emerging biology of RNA post-transcriptional modifications, RNA Biol., 14(2016), 156–163. |
[14] | W. Chen, P. M. Feng, H. Tang, et al., RAMPred: Identifying the N-1-methyladenosine sites in eukaryotic transcriptomes, Sci. Rep., 6(2016), 31080. |
[15] | W. Chen, H. Tang, J. Ye, et al., iRNA-PseU: Identifying RNA pseudouridine sites, Mol. Ther.-Nucl. Acids, 5(2016). |
[16] | J. J. He, T. Fang, Z. Z. Zhang, et al., PseUI: Pseudouridine sites identification based on RNA sequence information, BMC Bioinform., 19(2018), 306. |
[17] | W. R. Qiu, S. Y. Jiang, Z. C. Xu, et al., iRNAm5C-PseDNC: Identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition, Oncotarget, 8(2017), 41178–41188. |
[18] | J. W. Li, Y. Huang, X. Y. Yang, et al., RNAm5Cfinder: A web-server for predicting RNA 5-methylcytosine (m5C) sites based on random forest, Sci. Rep., 8(2018). |
[19] | P. M. Feng, H. Ding, H. Yang, et al., iRNA-PseColl: Identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC, Mol. Ther.-Nucl. Acids, 7(2017), 155–163. |
[20] | W. Chen, P. M. Feng, H. Yang, et al., iRNA-3typeA: Identifying three types of modification at RNA's adenosine sites, Mol. Ther.-Nucl. Acids, 11(2018), 468–474. |
[21] | Y. Huang, N. N. He, Y. Chen, et al., BERMP: A cross-species classifier for predicting m6A sites by integrating a deep learning algorithm and a random forest approach, Int. J. Biol. Sci., 14(2018), 1669–1677. |
[22] | J. J. Xuan, W. J. Sun, P. H. Lin, et al., RMBase v2.0: Deciphering the map of RNA modifications from epitranscriptome sequencing data, Nucleic Acids Res., 46(2018), D327–D334. |
[23] | D. Dominissini, S. Moshitch-Moshkovitz, S. Schwartz, et al., Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq, Nature, 485(2012), U201–U284. |
[24] | L. Fu, B. Niu, Z. Zhu, et al., CD-HIT: Accelerated for clustering the next-generation sequencing data, Bioinformatics, 28(2012), 3150–3152. |
[25] | W. Z. Li and A. Godzik, Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22(2006), 1658–1659. |
[26] | L. Zhu, H. B. Zhang and D. S. J. B. Huang, Direct AUC optimization of regulatory motifs, Bioinformatics, 33(2017), i243. |
[27] | H. Zhang, L. Zhu and D. S. J. S. R. Huang, WSMD: Weakly-supervised motif discovery in transcription factor ChIP-seq data, Sci. Rep., 7(2017). |
[28] | G. H. Chuai, H. H. Ma, J. F. Yan, et al., DeepCRISPR: Optimized CRISPR guide RNA design by deep learning, Genome Biol., 19(2018). |
[29] | Q. Zhang, L. Zhu and D. S. Huang, High-order convolutional neural network architecture for predicting DNA-protein binding sites, IEEE/ACM Transact. Comput. Biol. Bioinform., (2018), 1. |
[30] | Q. Zhang, L. Zhu, W. Bao, et al., Weakly-supervised convolutional neural network architecture for predicting protein-DNA binding, IEEE/ACM Transact. Comput. Biol. Bioinform., (2018), 1. |
[31] | A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet classification with deep convolutional neural networks, NIPS. Curran Assoc. Inc., (2012). |
[32] | D. P. Kingma and J. J. C. S. Ba, Adam: A method for stochastic optimization, (2014). |
[33] | C. Tan, F. Sun, K. Tao, et al., A survey on deep transfer learning, (2018). |
[34] | G. Litjens, T. Kooi, B. E. Bejnordi, et al., A survey on deep learning in medical image analysis, Med. Image Anal., 42(2017), 60–88. |
[35] | S. Liang, R. G. Zhang, D. Y. Liang, et al., Multimodal 3D denseNet for IDH genotype prediction in gliomas, Genes, 9(2018). |
[36] | L. Zhu, W. L. Guo, C. Lu, et al., Collaborative completion of transcription factor binding profiles via local sensitive unified embedding, IEEE Transact. NanoBiosci., (2016), 1. |
[37] | J. X. Wang, L. Chen, Y. Wang, et al., A computational systems biology study for understanding salt tolerance mechanism in rice, Plos One, 8(2013), 177–194. |
1. | Matthias R. Schaefer, The Regulation of RNA Modification Systems: The Next Frontier in Epitranscriptomics?, 2021, 12, 2073-4425, 345, 10.3390/genes12030345 | |
2. | Muhammad Tahir, Maqsood Hayat, Kil To Chong, A convolution neural network-based computational model to identify the occurrence sites of various RNA modifications by fusing varied features, 2021, 211, 01697439, 104233, 10.1016/j.chemolab.2021.104233 | |
3. | Kunqi Chen, Bowen Song, Yujiao Tang, Zhen Wei, Qingru Xu, Jionglong Su, João Pedro de Magalhães, Daniel J Rigden, Jia Meng, RMDisease: a database of genetic variants that affect RNA modifications, with implications for epitranscriptome pathogenesis, 2021, 49, 0305-1048, D1396, 10.1093/nar/gkaa790 | |
4. | Chunyan Ao, Liang Yu, Quan Zou, Prediction of bio-sequence modifications and the associations with diseases, 2021, 20, 2041-2649, 1, 10.1093/bfgp/elaa023 | |
5. | Mauno Vihinen, Systematics for types and effects of RNA variations, 2021, 18, 1547-6286, 481, 10.1080/15476286.2020.1817266 | |
6. | Jia Zou, Hui Liu, Wei Tan, Yi-qi Chen, Jing Dong, Shu-yuan Bai, Zhao-xia Wu, Yan Zeng, Dynamic regulation and key roles of ribonucleic acid methylation, 2022, 16, 1662-5102, 10.3389/fncel.2022.1058083 | |
7. | Juexin Wang, Yan Wang, Towards Machine Learning in Molecular Biology, 2020, 17, 1551-0018, 2822, 10.3934/mbe.2020156 | |
8. | Megan L. Van Horn, Anna M. Kietrys, 2021, Chapter 6, 978-3-030-71611-0, 165, 10.1007/978-3-030-71612-7_6 | |
9. | Lian Liu, Bowen Song, Jiani Ma, Yi Song, Song-Yao Zhang, Yujiao Tang, Xiangyu Wu, Zhen Wei, Kunqi Chen, Jionglong Su, Rong Rong, Zhiliang Lu, João Pedro de Magalhães, Daniel J. Rigden, Lin Zhang, Shao-Wu Zhang, Yufei Huang, Xiujuan Lei, Hui Liu, Jia Meng, Bioinformatics approaches for deciphering the epitranscriptome: Recent progress and emerging topics, 2020, 18, 20010370, 1587, 10.1016/j.csbj.2020.06.010 | |
10. | Gangqiang Guo, Kan Pan, Su Fang, Lele Ye, Xinya Tong, Zhibin Wang, Xiangyang Xue, Huidi Zhang, Advances in mRNA 5-methylcytosine modifications: Detection, effectors, biological functions, and clinical relevance, 2021, 26, 21622531, 575, 10.1016/j.omtn.2021.08.020 | |
11. | Muhammad Taseer Suleman, Yaser Daanial Khan, m1A-pred: Prediction of Modified 1-methyladenosine Sites in RNA Sequences through Artificial Intelligence, 2022, 25, 13862073, 2473, 10.2174/1386207325666220617152743 | |
12. | Yifan Wang, Pingxian Zhang, Weijun Guo, Hanqing Liu, Xiulan Li, Qian Zhang, Zhuoying Du, Guihua Hu, Xiao Han, Li Pu, Jian Tian, Xiaofeng Gu, A deep learning approach to automate whole‐genome prediction of diverse epigenomic modifications in plants, 2021, 232, 0028-646X, 880, 10.1111/nph.17630 | |
13. | A. El Allali, Zahra Elhamraoui, Rachid Daoud, Machine learning applications in RNA modification sites prediction, 2021, 19, 20010370, 5510, 10.1016/j.csbj.2021.09.025 | |
14. | Boyang Wang, Wenyu Zhang, MARnet: multi-scale adaptive residual neural network for chest X-ray images recognition of lung diseases, 2022, 19, 1551-0018, 331, 10.3934/mbe.2022017 | |
15. | Shaheena Khanum, Muhammad Adeel Ashraf, Asim Karim, Bilal Shoaib, Muhammad Adnan Khan, Rizwan Ali Naqvi, Kamran Siddique, Mohammed Alswaitti, Gly-LysPred: Identification of Lysine Glycation Sites in Protein Using Position Relative Features and Statistical Moments Via Chou’s 5 Step Rule, 2021, 66, 1546-2226, 2165, 10.32604/cmc.2020.013646 | |
16. | Mohsen Hesami, Milad Alizadeh, Andrew Maxwell Phineas Jones, Davoud Torkamaneh, Machine learning: its challenges and opportunities in plant system biology, 2022, 106, 0175-7598, 3507, 10.1007/s00253-022-11963-6 | |
17. | Sumin Yang, Sung-Hyun Kim, Eunjeong Yang, Mingon Kang, Jae-Yeol Joo, Molecular insights into regulatory RNAs in the cellular machinery, 2024, 56, 2092-6413, 1235, 10.1038/s12276-024-01239-6 | |
18. | Enrico Bortoletto, Umberto Rosani, Bioinformatics for Inosine: Tools and Approaches to Trace This Elusive RNA Modification, 2024, 15, 2073-4425, 996, 10.3390/genes15080996 | |
19. | Irum Aslam, Sajid Shah, Saima Jabeen, Mohammed ELAffendi, Asmaa A. Abdel Latif, Nuhman Ul Haq, Gauhar Ali, A CNN based m5c RNA methylation predictor, 2023, 13, 2045-2322, 10.1038/s41598-023-48751-9 | |
20. | Muhammad Taseer Suleman, Fahad Alturise, Tamim Alkhalifah, Yaser Daanial Khan, m1A-Ensem: accurate identification of 1-methyladenosine sites through ensemble models, 2024, 17, 1756-0381, 10.1186/s13040-023-00353-x | |
21. | Mingzhao Wang, Haider Ali, Yandi Xu, Juanying Xie, Shengquan Xu, BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities, 2024, 300, 00219258, 107140, 10.1016/j.jbc.2024.107140 | |
22. | Ying Zhang, Fang Ge, Fuyi Li, Xibei Yang, Jiangning Song, Dong-Jun Yu, Prediction of Multiple Types of RNA Modifications via Biological Language Model, 2023, 20, 1545-5963, 3205, 10.1109/TCBB.2023.3283985 | |
23. | Manmeet Kaur, Vandana Singh, Arshiya Khan, Khushboo Sharma, Francisco Jaime Bezerra Mendoonca Junior, Anuraj Nayarisseri, 2025, 9780443275746, 185, 10.1016/B978-0-443-27574-6.00006-0 | |
24. | Chelsea Chen Yuge, Ee Soon Hang, Madasamy Ravi Nadar Mamtha, Shashikant Vishwakarma, Sijia Wang, Cheng Wang, Nguyen Quoc Khanh Le, RNA-ModX: a multilabel prediction and interpretation framework for RNA modifications, 2024, 26, 1467-5463, 10.1093/bib/bbae688 | |
25. | Linhui Zhang, Yuelong Li, Liqing Li, Fei Yao, Maoping Cai, Dingwei Ye, Yuanyuan Qu, Detection, molecular function and mechanisms of m5C in cancer, 2025, 15, 2001-1326, 10.1002/ctm2.70239 |
Modification type | H. sapiens | M. musculus | S. cerevisiae | Total |
m1A | 2574 | 1052 | 1220 | 4819 |
pseudouridine | 4128 | 3320 | 2122 | 9570 |
m5C | 680 | 97 | 211 | 988 |
Layer | Hyper-parameters | ||
Activation function | units | Dropout | |
GRU | Relu | 64 | 0.2 |
GRU | Relu | 64 | 0.2 |
Dense | Relu | 64 | 0.2 |
Dense | Softmax | 2 | 0 |
Types | tools | Acc | precision | recall | SP | F1 score | MCC |
M1A | iRNA-3typeA[20] | 0.5119 | 0.5060 | 0.9979 | 0.0258 | 0.6715 | 0.1012 |
DeepMRMP | 0.9927 | 0.9887 | 0.9969 | 0.9886 | 0.9928 | 0.9856 | |
pseudouridine | PseUI[16] | 0.6018 | 0.5989 | 0.6165 | 0.5872 | 0.6076 | 0.2038 |
DeepMRMP | 0.6264 | 0.6675 | 0.5036 | 0.7492 | 0.5741 | 0.2608 | |
M5C | RNAm5Cfinder[18] | 0.6326 | 0.7954 | 0.3571 | 0.9081 | 0.4929 | 0.3179 |
DeepMRMP | 0.6632 | 0.7580 | 0.4795 | 0.8469 | 0.5874 | 0.3510 |
Modification type | H. sapiens | M. musculus | S. cerevisiae | Total |
m1A | 2574 | 1052 | 1220 | 4819 |
pseudouridine | 4128 | 3320 | 2122 | 9570 |
m5C | 680 | 97 | 211 | 988 |
Layer | Hyper-parameters | ||
Activation function | units | Dropout | |
GRU | Relu | 64 | 0.2 |
GRU | Relu | 64 | 0.2 |
Dense | Relu | 64 | 0.2 |
Dense | Softmax | 2 | 0 |
Types | tools | Acc | precision | recall | SP | F1 score | MCC |
M1A | iRNA-3typeA[20] | 0.5119 | 0.5060 | 0.9979 | 0.0258 | 0.6715 | 0.1012 |
DeepMRMP | 0.9927 | 0.9887 | 0.9969 | 0.9886 | 0.9928 | 0.9856 | |
pseudouridine | PseUI[16] | 0.6018 | 0.5989 | 0.6165 | 0.5872 | 0.6076 | 0.2038 |
DeepMRMP | 0.6264 | 0.6675 | 0.5036 | 0.7492 | 0.5741 | 0.2608 | |
M5C | RNAm5Cfinder[18] | 0.6326 | 0.7954 | 0.3571 | 0.9081 | 0.4929 | 0.3179 |
DeepMRMP | 0.6632 | 0.7580 | 0.4795 | 0.8469 | 0.5874 | 0.3510 |