SNFM: A semi-supervised NMF algorithm for detecting biological functional modules

Yutong Man; Guangming Liu; Kuo Yang; Xuezhong Zhou; Yutong Man; Guangming Liu; Kuo Yang; Xuezhong Zhou

doi:10.3934/mbe.2019094

Mathematical Biosciences and Engineering

2019, Volume 16, Issue 4: 1933-1948. doi: 10.3934/mbe.2019094

Previous Article Next Article

Research article Special Issues

SNFM: A semi-supervised NMF algorithm for detecting biological functional modules

1.
Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
2.
School of Computer Science & Engineering, Xi’an University of Technology, Xi’an 710048, China
3.
Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, China

Received: 17 December 2018 Accepted: 15 February 2019 Published: 07 March 2019

Unraveling protein functional modules from protein-protein interaction networks is a crucial step to better understand cellular mechanisms. In the past decades, numerous algorithms have been proposed to identify potential protein functional modules or complexes from protein-protein interaction (PPI) networks. Unfortunately, the number of PPIs is rather limited, and some interactions are false positive. Therefore, the algorithms that only utilize PPI networks may not obtain the expected results related to functional modules. In this study, we propose a novel semi-supervised functional module detection method based on non-negative matrix factorization(NMF)(SNFM), which incorporate high-quality supervised PPI links from complexes as prior information.Our method outperforms all the other competitors with improvements on performance by around 15.4% in Precision, 28.9% in Recall, 27.1% in F-score (on DIP data set) by using PCDq as gold standards.
- PPI,
- NMF,
- semi-supervised,
- functional modules,
- DIP
Citation: Yutong Man, Guangming Liu, Kuo Yang, Xuezhong Zhou. SNFM: A semi-supervised NMF algorithm for detecting biological functional modules[J]. Mathematical Biosciences and Engineering, 2019, 16(4): 1933-1948. doi: 10.3934/mbe.2019094

Related Papers:

Abstract

Unraveling protein functional modules from protein-protein interaction networks is a crucial step to better understand cellular mechanisms. In the past decades, numerous algorithms have been proposed to identify potential protein functional modules or complexes from protein-protein interaction (PPI) networks. Unfortunately, the number of PPIs is rather limited, and some interactions are false positive. Therefore, the algorithms that only utilize PPI networks may not obtain the expected results related to functional modules. In this study, we propose a novel semi-supervised functional module detection method based on non-negative matrix factorization(NMF)(SNFM), which incorporate high-quality supervised PPI links from complexes as prior information.Our method outperforms all the other competitors with improvements on performance by around 15.4% in Precision, 28.9% in Recall, 27.1% in F-score (on DIP data set) by using PCDq as gold standards.

References

[1]	AC. Gavin, M. Bsche and R. Krause, et al., Functional organization of the yeast proteome by systematic analysis of protein complexes, Nature, 415 (2002), 141.
[2]	FY. Yu, ZH. Yang and N. Tang, et al., Predicting protein complex in protein interaction network-a supervised learning based method, BMC. Syst. Biol., 8 (2014), S4.
[3]	Y. Ho, A. Gruhler and A. Heilbut, et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry, Nature, 415 (2002), 180.
[4]	R. Aebersold and M. Mann, Mass spectrometry-based proteomics, Nature, 422 (2003), 198.
[5]	T. Ito, T. Chiba and R. Ozawa, et al., A comprehensive two-hybrid analysis to explore the yeast protein interactome, P. Natl. A. Sci. India. B., 98 (2001), 4569–4574.
[6]	P. Uetz, L. Giot and G. Cagney, et al., A comprehensive analysis of proteinprotein interactions in Saccharomyces cerevisiae, Nature, 403 (2000), 623.
[7]	AJ. Enright, S. Van Dongen and CA. Ouzounis, An efficient algorithm for large-scale detection of protein families, Nucleic Acids Res., 30 (2002), 1575–1584.
[8]	T. Pawson and R. Linding, Network medicine, FEBS. Lett., 582 (2008), 1266–1270.
[9]	LH. Hartwell, JJ. Hopfield and S. Leibler, et al., From molecular to modular cell biology, Nature, 402 (1999), C47.
[10]	J. Ji, A. Zhang and C. Liu, et al., Survey: Functional module detection from protein-protein interaction networks, IEEE. T. Knowl. Data. En., 26 (2014), 261–277.
[11]	B. Cao, J. Luo and C. Liang, et al., PCE-FR: A Novel Method for Identifying Overlapping Protein Complexes inWeighted Protein-Protein Interaction Networks Using Pseudo-Clique Extension Based on Fuzzy Relation, IEEE. T. Nanobiosci., 15, 728–738.
[12]	P. Sah, LO. Singh and A. Clauset,. et al., Exploring community structure in biological networks with random graphs, BMC. Bioinform., 15 (2014), 220.
[13]	H. Rahmani, H. Blockeel and A. Bender, Predicting the functions of proteins in protein-protein interaction networks from global information, Machine Learn. Systems Biol., (2009), 82–97.
[14]	HN. Chua, W. K. Sung and L. Wong, Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions, Bioinformatics, 22 (2006), 1623–1630.
[15]	Y. Wang and X. Qian, Functional module identification in protein interaction networks by interaction patterns, Bioinformatics, 30 (2013), 81–93.
[16]	G. Liu, B. Chai and K. Yang, et al., Overlapping functional modules detection in PPI network with pair-wise constrained non-negative matrix trifactorisation, IET. Syst. Biol., 12 (2017), 45–54.
[17]	V. Spirin and LA. Mirny, Protein complexes and functional modules in molecular networks, P. Natl. A. Sci. India. B., 100 (2003), 12123–12128.
[18]	X. L. Li , C. S. Foo and S. H. Tan, et al., Interaction graph mining for protein complexes using local clique merging, Genome Inform., 16 (2005), 260–269.
[19]	B. Adamcsek, G. Palla and IJ. Farkas, et al., CFinder: locating cliques and overlapping modules in biological networks, Bioinformatics, 22 (2006), 1021–1023.
[20]	M.Wu, X. Li and C. K. Kwoh,. et al., A core-attachment based method to detect protein complexes in PPI networks, BMC Bioinform., 10 (2009), 169.
[21]	J. Menche, A. Sharma and M. Kitsak, et al., Uncovering disease-disease relationships through the incomplete interactome, Science, 347 (2015), 1257601.
[22]	X. F. Zhang, D. Q. Dai and L. Ou-Yang, et al., Detecting overlapping protein complexes based on a generative model with functional and topological properties, BMC Bioinform., 15 (2014), 186.
[23]	E. Georgii, S. Dietmann and T. Uno, et al., Enumeration of condition-dependent dense modules in protein interaction networks. Bioinformatics, 25 (2009), 933–940.
[24]	Y. Zhang, N. Du and L. Ge, et al., A collective nmf method for detecting protein functional module from multiple data sources, Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM, (2012), pp. 655–660.
[25]	S. Kikugawa, K. Nishikata and K. Murakami, et al., PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from h-invitational protein-protein interactions integrative dataset, BMC. Syst. Biol., 6, S7.
[26]	Y. Qi, F. Balem and C. Faloutsos, et al., Protein complex identification by supervised graph local clustering. Bioinformatics, 24 (2008), i250–i268.
[27]	E. Elhamifar and R. Vidal, Sparse subspace clustering, 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), pp 2790–2797.
[28]	F. Wang, T. Li and X. Wang, et al., Community discovery using nonnegative matrix factorization, DATA. Min. Knowl. Disc., 22 (2011), 493–521.
[29]	J. Menche, A. Sharma and M. Kitsak, et al., Uncovering disease-disease relationships through the incomplete interactome, Science, 347 (2015), 1257601.
[30]	A. Ruepp, B.Waegele and M. Lechner, et al., CORUM: the comprehensive resource of mammalian protein complexes. Nucleic acids Rec., 38 (2009), D497–D501.
[31]	L. Yang, X. Cao and D. Jin, et al., A unified semi-supervised community detection framework using latent space graph regularization, Data. Min. Knowl. Disc., 45 (2015), 2585–2598.
[32]	I. Xenarios, L. Salwinski and X. J. Duan, et al., DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions, Nucleic Acids Res., 30 (2002), 303–305.
[33]	TS. Keshava Prasad, R. Goel and K. Kandasamy, et al., Human protein reference database 2009 update, Nucleic Acids Res., 37 (2008), D767–D772.
[34]	C. Von Mering, LJ. Jensen and B. Snel, et al., STRING: known and predicted proteinprotein associations, integrated and transferred across organisms, Nucleic Acids Res., 33 (2005), D433– D437.
[35]	C. Peng and A. Li, A heterogeneous network based method for identifying GBM-related genes by integrating multi-dimensional data, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)., 14 (2017), 713–720.
[36]	A. Ruepp, B. Brauner and I. Dunger-Kaltenbach, et al., CORUM: the comprehensive resource of mammalian protein complexes, Nucleic Acids Res., 36 (2007), D646–D650.
[37]	B. Chen, W. Fan and J. Liu, et al., Identifying protein complexes and functional modules from static PPI networks to dynamic PPI networks, Brief Bionfrom., 15 (2013), 177–194.
[38]	Y. Yu, J. Liu and N. Feng, et al., Combining sequence and Gene Ontology for protein module detection in the Weighted Network, J. Theor. Biol., 412 (2017), 107–112.
[39]	B. Zhao, J. Wang and M. Li, et al., Detecting protein complexes based on uncertain graph model, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 11 (2014), 486– 497.
[40]	J.Wang, J. Ren and M. Li, et al., Identification of hierarchical and overlapping functional modules in PPI networks, IEEE. T. Nanobiosci., 11 (2012), 386–393.
[41]	S. van Dongen, Graph clustering by flow simulation, PhD thesis, University of Utrecht,(2000).
[42]	G. Liu, L.Wong and H. N. Chua, Complex discovery from weighted PPI networks, Bioinformatics, 25 (2009), 1891–1897.
[43]	J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, (1967). p. 281–297.
[44]	J. Lee and J. Lee, Hidden information revealed by optimal community structure from a proteincomplex bipartite network improves protein function prediction, PloS one, 8 (2013), e60372.
[45]	G. Liu, H.Wang and H. Chu, et al., Functional diversity of topological modules in human proteinprotein interaction networks, Sci. Rep-UK., 7 (2017), 16199.

Reader Comments

Your name:*

Email:*
© 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)