Long non-coding RNAs (lncRNAs) play a regulatory role in many biological cells, and the recognition of lncRNA-protein interactions is helpful to reveal the functional mechanism of lncRNAs. Identification of lncRNA-protein interaction by biological techniques is costly and time-consuming. Here, an ensemble learning framework, RLF-LPI is proposed, to predict lncRNA-protein interactions. The RLF-LPI of the residual LSTM autoencoder module with fusion attention mechanism can extract the potential representation of features and capture the dependencies between sequences and structures by k-mer method. Finally, the relationship between lncRNA and protein is learned through the method of fuzzy decision. The experimental results show that the ACC of RLF-LPI is 0.912 on ATH948 dataset and 0.921 on ZEA22133 dataset. Thus, it is demonstrated that our proposed method performed better in predicting lncRNA-protein interaction than other methods.
Citation: Jinmiao Song, Shengwei Tian, Long Yu, Qimeng Yang, Qiguo Dai, Yuanxu Wang, Weidong Wu, Xiaodong Duan. RLF-LPI: An ensemble learning framework using sequence information for predicting lncRNA-protein interaction based on AE-ResLSTM and fuzzy decision[J]. Mathematical Biosciences and Engineering, 2022, 19(5): 4749-4764. doi: 10.3934/mbe.2022222
Long non-coding RNAs (lncRNAs) play a regulatory role in many biological cells, and the recognition of lncRNA-protein interactions is helpful to reveal the functional mechanism of lncRNAs. Identification of lncRNA-protein interaction by biological techniques is costly and time-consuming. Here, an ensemble learning framework, RLF-LPI is proposed, to predict lncRNA-protein interactions. The RLF-LPI of the residual LSTM autoencoder module with fusion attention mechanism can extract the potential representation of features and capture the dependencies between sequences and structures by k-mer method. Finally, the relationship between lncRNA and protein is learned through the method of fuzzy decision. The experimental results show that the ACC of RLF-LPI is 0.912 on ATH948 dataset and 0.921 on ZEA22133 dataset. Thus, it is demonstrated that our proposed method performed better in predicting lncRNA-protein interaction than other methods.
[1] | D. Guan, W. Zhang, G. H. Liu, J. C. Belmonte, Switching cell fate, ncRNAs coming to play, Cell Death Dis., 4 (2013), e464. https://doi.org/10.1038/cddis.2012.196 doi: 10.1038/cddis.2012.196 |
[2] | J. J. Quinn, H. Y. Chang, Unique features of long non-coding RNA biogenesis and function, Nat. Rev. Genet., 17 (2016), 47–62. https://doi.org/10.1038/nrg.2015.10 doi: 10.1038/nrg.2015.10 |
[3] | K. Panzitt, M. M. O. Tschernatsch, C. Guelly, T. Moustafa, M. Stradner, H. M. Strohmaier, et al., Characterization of HULC, a novel gene with striking up-regulation in hepatocellular carcinoma, as noncoding RNA, Gastroenterology, 132 (2007), 330–342. https://doi.org/10.1053/j.gastro.2006.08.026 doi: 10.1053/j.gastro.2006.08.026 |
[4] | J. Wang, X. Liu, H. Wu, P. Ni, Z. Gu, Y. Qiao, et al., CREB up-regulates long non-coding RNA, HULC expression through interaction with microRNA-372 in liver cancer, Nucleic Acids Res., 38 (2010), 5366–5383. https://doi.org/10.1093/nar/gkq285 doi: 10.1093/nar/gkq285 |
[5] | A. C. Kaushik, A. Mehmood, X. Wang, D. Q. Wei, X. Dai, Globally ncrnas expression profiling of tnbc and screening of functional lncrna, Front. Bioeng. Biotechnol., 8 (2021), 1480. https://doi.org/10.3389/fbioe.2020.523127 doi: 10.3389/fbioe.2020.523127 |
[6] | X. Pan, P. Rijnbeek, J. Yan, H. B. Shen, Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks, BMC Genomics, 19 (2018). https://doi.org/10.1186/s12864-018-4889-1 |
[7] | D. Adjeroh, M. Allaga, J. Tan, J. Lin, Y. Jiang, A. Abbasi, et al., Feature-based and string-based models for predicting RNA-protein interaction, Molecules, 23 (2018), 697. https://doi.org/10.3390/molecules23030697 doi: 10.3390/molecules23030697 |
[8] | S. W. Zhang, X. N. Fan, Computational methods for predicting ncRNA-protein interactions, Med. Chem., 13 (2017), 515–525. https://doi.org/10.2174/1573406413666170510102405 doi: 10.2174/1573406413666170510102405 |
[9] | L. Peng, F. Liu, J. Yang, X. Liu, Y. Meng, X. Deng, et al., Probing lncRNA–protein interactions: data repositories, models, and algorithms, Front. Genet., (2020), 1346. https://doi.org/10.3389/fgene.2019.01346 |
[10] | H. Hu, L. Zhang, H. Ai, H. Zhang, Y. Fan, Q. Zhao, H Liu, et al., HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy, RNA Biol., 15 (2018), 797–806. https://doi.org/10.1080/15476286.2018.1457935 doi: 10.1080/15476286.2018.1457935 |
[11] | Q. Lu, S. Ren, M. Lu, Y. Zhang, D. Zhu, X. Zhang, et al., Computational prediction of associations between long non-coding RNAs and proteins, BMC Genomics, 14 (2013). https://doi.org/10.1186/1471-2164-14-651 |
[12] | W. Zhang, Q. Qu, Y. Zhang, W. Wang, The linear neighborhood propagation method for predicting long non-coding RNA–protein interactions, Neurocomputing, 273 (2018), 526–534. https://doi.org/10.1016/j.neucom.2017.07.065 doi: 10.1016/j.neucom.2017.07.065 |
[13] | Q. Zhao, Y. Zhang, H. Hu, G. Ren, W. Zhang, H. Liu, IRWNRLPI: integrating random walk and neighborhood regularized logistic matrix factorization for lncRNA-protein interaction prediction, Front. Genet., 9 (2018), 239. https://doi.org/10.3389/fgene.2018.00239 doi: 10.3389/fgene.2018.00239 |
[14] | R. Zhu, G. Li, J. X. Liu, L. Y. Dai, Y. Guo, ACCBN: Ant-Colony-clustering-based bipartite network method for predicting long non-coding RNA-protein interactions, BMC Bioinf., 20 (2019). https://doi.org/10.1186/s12859-018-2586-3 |
[15] | T. Zhang, M. Wang, J. Xi, A. Li, LPGNMF: predicting long non-coding RNA and protein interaction using graph regularized nonnegative matrix factorization, IEEE/ACM Trans. Comput. Biol. Bioinf., 17 (2018), 189–197. https://doi.org/10.1109/TCBB.2018.2861009 doi: 10.1109/TCBB.2018.2861009 |
[16] | H. Zhang, Z. Ming, C. Fan, Q. Zhao, H. Liu, A path-based computational model for long non-coding RNA-protein interaction prediction, Genomics, 112 (2020), 1754–1760. https://doi.org/10.1016/j.ygeno.2019.09.018 doi: 10.1016/j.ygeno.2019.09.018 |
[17] | U. K. Muppirala, V. G. Honavar, D. Dobbs, Predicting RNA-protein interactions using only sequence information, BMC Bioinf., 12 (2011). https://doi.org/10.1186/1471-2105-12-489 |
[18] | Y. Wang, X. Chen, Z. P. Liu, Q. Huang, Y. Wang, D. Xu, et al., De novo prediction of RNA-protein interactions from sequence information, Mol. Biosyst., 9 (2013), 133–142. https://doi.org/10.1039/C2MB25292A doi: 10.1039/C2MB25292A |
[19] | X. Pan, Y. X. Fan, J. Yan, H. B. Shen, IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction, BMC Genomics, 17 (2016), 582. https://doi.org/10.1186/s12864-016-2931-8 doi: 10.1186/s12864-016-2931-8 |
[20] | L. Peng, R. Yuan, L. Shen, P. Gao, L. Zhou, LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification, Biodata Min., 14 (2021), 50. https://orcid.org/0000-0002-2321-3901 |
[21] | C. Peng, S. Han, H. Zhang, Y. Li, RPITER: a hierarchical deep learning framework for ncRNA–protein interaction prediction, Int. J. Mol. Sci., 20 (2019), 1070. https://doi.org/10.3390/ijms20051070 doi: 10.3390/ijms20051070 |
[22] | J. S. Wekesa, J. Meng, Y. Luan, A deep learning model for plant lncRNA-protein interaction prediction with graph attention, Mol. Genet. Genomics, 295 (2020), 1091–1102. https://doi.org/10.1007/s00438-020-01682-w doi: 10.1007/s00438-020-01682-w |
[23] | J. S. Wekesa, J. Meng, Y. Luan, Multi-feature fusion for deep learning to predict plant lncRNA-protein interaction, Genomics, 112 (2020), 2928–2936. https://doi.org/10.1016/j.ygeno.2020.05.005 doi: 10.1016/j.ygeno.2020.05.005 |
[24] | H. Zhou, Y. Luan, J. S. Wekesa, J. Meng, Prediction of plant lncRNA-protein interactions using sequence information based on deep learning, in International Conference on Intelligent Computing, (2019), 358–368. https://doi.org/10.1007/978-3-030-26766-7_33 |
[25] | Y. Huang, B. Niu, Y. Gao, L. Fu, W. Li, CD-HIT Suite: a web server for clustering and comparing biological sequences, Bioinformatics, 26 (2010), 680–682. https://doi.org/10.1093/bioinformatics/btq003 doi: 10.1093/bioinformatics/btq003 |
[26] | I. Goodfellow, Y. Bengio, A. Courville, Regularization for deep learning, Deep learn., (2016), 216–261. |
[27] | Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy, Hierarchical attention networks for document classification, in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, (2016), 1480–1489. https://doi.org/10.18653/v1/N16-1174 |
[28] | Q. Kang, J. Meng, J. Cui, Y. Luan, M. Chen, PmliPred: a method based on hybrid model and fuzzy decision for plant miRNA–lncRNA interaction prediction, Bioinformatics, 36 (2020), 2986–2992. https://doi.org/10.1093/bioinformatics/btaa074 doi: 10.1093/bioinformatics/btaa074 |
[29] | R. Lorenz, S. H. Bernhart, C. H. Siederdissen, H. Tafer, C. Flamm, P. F. Stadler, et al., ViennaRNA Package 2.0, Algorithms Mol. Biol., 6 (2011). https://doi.org/10.1186/1748-7188-6-26 |
[30] | C. Geourjon, G. Deleage, SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments, Bioinformatics, 11 (1995), 681–684. https://doi.org/10.1093/bioinformatics/11.6.681 doi: 10.1093/bioinformatics/11.6.681 |
[31] | G. Montavon, G. Orr, K. R. Müller, Neural Networks: Tricks of the Trade, springer, 2012. |
[32] | N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15 (2014), 1929–1958. |
[33] | J. S. Wekesa, Y. Luan, J. Meng, LPI-DL: A recurrent deep learning model for plant lncRNA-protein interaction and function prediction with feature optimization, in 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), (2020), 499–502. https://doi.org/10.1109/BIBM49941.2020.9313431 |
[34] | H. C. Yi, Z. H. You, D. S. Huang, X. Li, T. H. Jiang, L. P. Li, A deep learning framework for robust and accurate prediction of ncRNA-protein interactions using evolutionary information, Mol. Ther.-Nucleic Acids, 11 (2019), 337–344. https://doi.org/10.1016/j.omtn.2018.03.001 doi: 10.1016/j.omtn.2018.03.001 |
[35] | Z. H. Zhan, L. N. Jia, Y. Zhou, L. P. Li, H. C. Yi, BGFE: a deep learning model for ncRNA-protein interaction predictions based on improved sequence information, Int. J. Mol. Sci., 20 (2019), 978. https://doi.org/10.3390/ijms20040978 doi: 10.3390/ijms20040978 |
[36] | H. C. Yi, Z. H. You, M. N. Wang, Z. H. Guo, Y. B. Wang, J. R. Zhou, RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information, BMC Bioinf., 21 (2020), 60. https://doi.org/10.1186/s12859-020-3406-0 doi: 10.1186/s12859-020-3406-0 |
[37] | Q. Zhao, H. Yu, Z. Ming, H. Hu, G. Ren, H. Liu, The bipartite network projection-recommended algorithm for predicting long non-coding RNA-protein interactions, Mol. Ther.-Nucleic Acids, 13 (2018), 464–471. https://doi.org/10.1016/j.omtn.2018.09.020 doi: 10.1016/j.omtn.2018.09.020 |