Research article Special Issues

DeepMRMP: A new predictor for multiple types of RNA modification sites using deep learning

  • Received: 01 March 2019 Accepted: 20 June 2019 Published: 04 July 2019
  • RNA modification plays an indispensable role in the regulation of organisms. RNA modification site prediction offers an insight into diverse cellular processing. Regarding different types of RNA modification site prediction, it is difficult to tell the most relevant feature combinations from a variant of RNA properties. Thereby, the performance of traditional machine learning based predictors relied on the skill of feature engineering. As a data-driven approach, deep learning can detect optimal feature patterns to represent input data. In this study, we developed a predictor for multiple types of RNA modifications method called DeepMRMP (Multiple Types RNA Modification Sites Predictor), which is based on the bidirectional Gated Recurrent Unit (BGRU) and transfer learning. DeepMRMP makes full use of multiple RNA site modification data and correlation among them to build predictor for different types of RNA modification sites. Through 10-fold cross-validation of the RNA sequences of H. sapiens, M. musculus and S. cerevisiae, DeepMRMP acted as a reliable computational tool for identifying N1-methyladenosine (m1A), pseudouridine (Ψ), 5-methylcytosine (m5C) modification sites.

    Citation: Pingping Sun, Yongbing Chen, Bo Liu, Yanxin Gao, Ye Han, Fei He, Jinchao Ji. DeepMRMP: A new predictor for multiple types of RNA modification sites using deep learning[J]. Mathematical Biosciences and Engineering, 2019, 16(6): 6231-6241. doi: 10.3934/mbe.2019310

    Related Papers:

    [1] Nick Cercone . What's the Big Deal About Big Data?. Big Data and Information Analytics, 2016, 1(1): 31-79. doi: 10.3934/bdia.2016.1.31
    [2] Ali Asgary, Jianhong Wu . ADERSIM-IBM partnership in big data. Big Data and Information Analytics, 2016, 1(4): 277-278. doi: 10.3934/bdia.2016010
    [3] Yaguang Huangfu, Guanqing Liang, Jiannong Cao . MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics. Big Data and Information Analytics, 2016, 1(4): 349-376. doi: 10.3934/bdia.2016015
    [4] Pankaj Sharma, David Baglee, Jaime Campos, Erkki Jantunen . Big data collection and analysis for manufacturing organisations. Big Data and Information Analytics, 2017, 2(2): 127-139. doi: 10.3934/bdia.2017002
    [5] Enrico Capobianco . Born to be Big: data, graphs, and their entangled complexity. Big Data and Information Analytics, 2016, 1(2): 163-169. doi: 10.3934/bdia.2016002
    [6] John A. Doucette, Robin Cohen . A testbed to enable comparisons between competing approaches for computational social choice. Big Data and Information Analytics, 2016, 1(4): 309-340. doi: 10.3934/bdia.2016013
    [7] Hamzeh Khazaei, Marios Fokaefs, Saeed Zareian, Nasim Beigi-Mohammadi, Brian Ramprasad, Mark Shtern, Purwa Gaikwad, Marin Litoiu .
     How do I choose the right NoSQL solution? A comprehensive theoretical and experimental survey 
    . Big Data and Information Analytics, 2016, 1(2): 185-216. doi: 10.3934/bdia.2016004
    [8] Richard Boire . UNDERSTANDING AI IN A WORLD OF BIG DATA. Big Data and Information Analytics, 2018, 3(1): 22-42. doi: 10.3934/bdia.2018001
    [9] M Supriya, AJ Deepa . Machine learning approach on healthcare big data: a review. Big Data and Information Analytics, 2020, 5(1): 58-75. doi: 10.3934/bdia.2020005
    [10] Weidong Bao, Wenhua Xiao, Haoran Ji, Chao Chen, Xiaomin Zhu, Jianhong Wu . Towards big data processing in clouds: An online cost-minimization approach. Big Data and Information Analytics, 2016, 1(1): 15-29. doi: 10.3934/bdia.2016.1.15
  • RNA modification plays an indispensable role in the regulation of organisms. RNA modification site prediction offers an insight into diverse cellular processing. Regarding different types of RNA modification site prediction, it is difficult to tell the most relevant feature combinations from a variant of RNA properties. Thereby, the performance of traditional machine learning based predictors relied on the skill of feature engineering. As a data-driven approach, deep learning can detect optimal feature patterns to represent input data. In this study, we developed a predictor for multiple types of RNA modifications method called DeepMRMP (Multiple Types RNA Modification Sites Predictor), which is based on the bidirectional Gated Recurrent Unit (BGRU) and transfer learning. DeepMRMP makes full use of multiple RNA site modification data and correlation among them to build predictor for different types of RNA modification sites. Through 10-fold cross-validation of the RNA sequences of H. sapiens, M. musculus and S. cerevisiae, DeepMRMP acted as a reliable computational tool for identifying N1-methyladenosine (m1A), pseudouridine (Ψ), 5-methylcytosine (m5C) modification sites.


    Researchers in the computational intelligence society have been consistently achieving progress in making machines more intelligent from various aspects, including representations, learning models, and optimization methods. The development of these techniques provides useful tools for big data and information analytics. This special issue aims at presenting recent advancements of combining computational intelligence methods with big data. We accepted 7 papers after a strict review process. Each paper was reviewed by at least two reviewers. We hope the accepted papers to this special issue will provide a useful reference for researchers who are interested in computational intelligence and big data, and inspire more possibilities of novel methods and applications.

    The accepted papers can be roughly divided into three categories, according to the aspects they involve.

    On the aspect of representation, the article "Multiple-instance learning for text categorization based on semantic representation" by Zhang et al. employs the multi-instance representation for text data. A text document is usually represented as a single feature vector, which could be insufficient to expose its rich content for learning. This paper, based on the popular word2vec technique, represents a document by multiple instances. In such a way, the semantic meanings of a document can be well exposed, and the experiments show improved performance over single instance representation.

    Another article "A comparative study of robustness measures for cancer signaling networks" by Zhou et al. studies the cancer signaling data represented as a network. The information exchange pathways in the cancer signaling network are essential to the cure of cancer, thus it is meaningful to find a sensitive measure of the network that is highly correlated with patient survivability. This work investigates the robustness of 14 typical cancer signaling networks. Experiments find out that the natural connectivity is a promising measurement, which could be expected to help cancer treatments.

    On the aspect of learning models, the extreme learning machine is a recently emerged simple neural network model with randomly determined connection weights. The article "Two-hidden-layer extreme learning machine based wrist vein recognition system" by Yue et al. employs such neural network with two hidden layers to achieve a good performance in the wrist vein recognition task with a satisfactory training time.

    Incremental ability of learning models are often appealing. The article "Selective further learning of hybrid ensemble for class imbalanced Increment learning" by Lin and Tang addresses the class imbalance issue which naturally arises in incremental learning, and proposes an ensemble-based method Selective Further Learning, where different component learners handle different issues of the learning. Experiments show that the proposed method outperforms some recent state-of-the-art approaches.

    On the aspect of optimization methods, the article "A clustering based mate selection for evolutionary optimization" by Zhang et al. introduces the mate selection mechanism into evolutionary algorithms. Helped by the clustering, the mate of an individual is restricted in the same cluster. With this new mechanism, the evolutionary algorithm optimizes a set of benchmark functions better.

    Optimization is also related with representation. In the article "A moving block sequence-based evolutionary algorithm for resource investment project scheduling problems" by Yuan et al. proposes the moving block sequence representation for the resource investment project scheduling problem. The new representation can guarantee some good properties of the solved solution, and consequently the proposed approach shows superior performance on 450 benchmark instances.

    Better optimization can lead to better learning. In the article "An evolutionary multiobjective method for low-rank and sparse matrix decomposition" by Wu et al, a multiobjective evolutionary approach is employed to solve the low-rank matrix decomposition problem. The multiobjective approach can well trade-off between low-rank and sparse objectives, leading to satisfied results on nature image analysis.

    We thank all the authors for their contributions to this special issue, and the reviewers for their careful and insightful reviews. We also thank Prof. Jianhong Wu and Prof. Zongben Xu, the Editor-in-Chiefs of the Big Data and Information Analytics journal, and Prof. Zhi-Hua Zhou from the Editorial Board of the journal for the full support of this special issue, and the Aimsciences staff for managing this special issue.




    [1] S. Dunin-Horkawicz, A. Czerwoniec, M. J. Gajda, et al., MODOMICS: A database of RNA modification pathways, Nucleic Acids Res., 34(2006), D145–D149.
    [2] J. H. Ge and Y. T. Yu, RNA pseudouridylation: New insights into an old modification, Trends Biochem. Sci., 38(2013), 210–218. 2. J. H. Ge and Y. T. Yu, RNA pseudouridylation: New insights into an old modification, Trends Biochem. Sci., 38(2013), 210–218.
    [3] M. Charette and M. W. Gray, Pseudouridine in RNA: what, where, how, and why, IUBMB Life, 49(2010), 341–351. 3. M. Charette and M. W. Gray, Pseudouridine in RNA: what, where, how, and why, IUBMB Life, 49(2010), 341–351.
    [4] D. R. Davis, C. A. Veltri, L. J. J. o. B. S. Nielsen, et al., An RNA model system for investigation of pseudouridine stabilization of the codon-anticodon interaction in tRNALys, tRNAHis and tRNATyr, J. Biomol. Struct. Dyn., 15(1998), 1121–1132.
    [5] A. Basak and C. Query, A pseudouridine residue in the spliceosome core is part of the filamentous growth program in yeast, Cell Reports, 8(2014), 966–973.
    [6] X. Yang, Y. Yang, B. F. Sun, et al., 5-methylcytosine promotes mRNA export-NSUN2 as the methyltransferase and ALYREF as an m5C reader, Cell Res., 27(2017), 606–625.
    [7] M. Frye and F. M. Watt, The RNA methyltransferase Misu (NSun2) mediates Myc-induced proliferation and is upregulated in tumors, Curr. Biol., 16(2006), 971–981.
    [8] X. Wang, Z. Lu, A. Gomez, et al., N6-methyladenosine-dependent regulation of messenger RNA stability, Nature, 505(2014), 117–120.
    [9] C. Roost, S. R. Lynch, P. J. Batista, et al., Structure and thermodynamics of N6-methyladenosine in RNA: A spring-loaded base modification, J. Am. Chem. Soc., 137(2015), 2107–2115.
    [10] T. Chen, Y. J. Hao, Y. Zhang, et al., m6A RNA methylation is regulated by micrornas and promotes reprogramming to pluripotency, Cell Stem Cell, 16(2015), 289–301.
    [11] S. Geula, S. Moshitch-Moshkovitz, D. Dominissini, et al., m6A mRNA methylation facilitates resolution of naive pluripotency toward differentiation, Science, 347(2015), 1002–1006.
    [12] X. Li, X. Xiong, K. Wang, et al., Transcriptome-wide mapping reveals reversible and dynamic N1-methyladenosine methylome, Nat. Chem. Biol., 12(2016), 311.
    [13] S. Nachtergaele and C. J. R. B. He, The emerging biology of RNA post-transcriptional modifications, RNA Biol., 14(2016), 156–163.
    [14] W. Chen, P. M. Feng, H. Tang, et al., RAMPred: Identifying the N-1-methyladenosine sites in eukaryotic transcriptomes, Sci. Rep., 6(2016), 31080.
    [15] W. Chen, H. Tang, J. Ye, et al., iRNA-PseU: Identifying RNA pseudouridine sites, Mol. Ther.-Nucl. Acids, 5(2016).
    [16] J. J. He, T. Fang, Z. Z. Zhang, et al., PseUI: Pseudouridine sites identification based on RNA sequence information, BMC Bioinform., 19(2018), 306.
    [17] W. R. Qiu, S. Y. Jiang, Z. C. Xu, et al., iRNAm5C-PseDNC: Identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition, Oncotarget, 8(2017), 41178–41188.
    [18] J. W. Li, Y. Huang, X. Y. Yang, et al., RNAm5Cfinder: A web-server for predicting RNA 5-methylcytosine (m5C) sites based on random forest, Sci. Rep., 8(2018).
    [19] P. M. Feng, H. Ding, H. Yang, et al., iRNA-PseColl: Identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC, Mol. Ther.-Nucl. Acids, 7(2017), 155–163.
    [20] W. Chen, P. M. Feng, H. Yang, et al., iRNA-3typeA: Identifying three types of modification at RNA's adenosine sites, Mol. Ther.-Nucl. Acids, 11(2018), 468–474.
    [21] Y. Huang, N. N. He, Y. Chen, et al., BERMP: A cross-species classifier for predicting m6A sites by integrating a deep learning algorithm and a random forest approach, Int. J. Biol. Sci., 14(2018), 1669–1677.
    [22] J. J. Xuan, W. J. Sun, P. H. Lin, et al., RMBase v2.0: Deciphering the map of RNA modifications from epitranscriptome sequencing data, Nucleic Acids Res., 46(2018), D327–D334.
    [23] D. Dominissini, S. Moshitch-Moshkovitz, S. Schwartz, et al., Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq, Nature, 485(2012), U201–U284.
    [24] L. Fu, B. Niu, Z. Zhu, et al., CD-HIT: Accelerated for clustering the next-generation sequencing data, Bioinformatics, 28(2012), 3150–3152.
    [25] W. Z. Li and A. Godzik, Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22(2006), 1658–1659.
    [26] L. Zhu, H. B. Zhang and D. S. J. B. Huang, Direct AUC optimization of regulatory motifs, Bioinformatics, 33(2017), i243.
    [27] H. Zhang, L. Zhu and D. S. J. S. R. Huang, WSMD: Weakly-supervised motif discovery in transcription factor ChIP-seq data, Sci. Rep., 7(2017).
    [28] G. H. Chuai, H. H. Ma, J. F. Yan, et al., DeepCRISPR: Optimized CRISPR guide RNA design by deep learning, Genome Biol., 19(2018).
    [29] Q. Zhang, L. Zhu and D. S. Huang, High-order convolutional neural network architecture for predicting DNA-protein binding sites, IEEE/ACM Transact. Comput. Biol. Bioinform., (2018), 1.
    [30] Q. Zhang, L. Zhu, W. Bao, et al., Weakly-supervised convolutional neural network architecture for predicting protein-DNA binding, IEEE/ACM Transact. Comput. Biol. Bioinform., (2018), 1.
    [31] A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet classification with deep convolutional neural networks, NIPS. Curran Assoc. Inc., (2012).
    [32] D. P. Kingma and J. J. C. S. Ba, Adam: A method for stochastic optimization, (2014).
    [33] C. Tan, F. Sun, K. Tao, et al., A survey on deep transfer learning, (2018).
    [34] G. Litjens, T. Kooi, B. E. Bejnordi, et al., A survey on deep learning in medical image analysis, Med. Image Anal., 42(2017), 60–88.
    [35] S. Liang, R. G. Zhang, D. Y. Liang, et al., Multimodal 3D denseNet for IDH genotype prediction in gliomas, Genes, 9(2018).
    [36] L. Zhu, W. L. Guo, C. Lu, et al., Collaborative completion of transcription factor binding profiles via local sensitive unified embedding, IEEE Transact. NanoBiosci., (2016), 1.
    [37] J. X. Wang, L. Chen, Y. Wang, et al., A computational systems biology study for understanding salt tolerance mechanism in rice, Plos One, 8(2013), 177–194.
  • Reader Comments
  • © 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5794) PDF downloads(927) Cited by(26)

Article outline

Figures and Tables

Figures(4)  /  Tables(3)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog