Research article Special Issues

GSEnet: feature extraction of gene expression data and its application to Leukemia classification


  • Received: 21 December 2021 Revised: 03 February 2022 Accepted: 16 February 2022 Published: 14 March 2022
  • Gene expression data is highly dimensional. As disease-related genes account for only a tiny fraction, a deep learning model, namely GSEnet, is proposed to extract instructive features from gene expression data. This model consists of three modules, namely the pre-conv module, the SE-Resnet module, and the SE-conv module. Effectiveness of the proposed model on the performance improvement of 9 representative classifiers is evaluated. Seven evaluation metrics are used for this assessment on the GSE99095 dataset. Robustness and advantages of the proposed model compared with representative feature selection methods are also discussed. Results show superiority of the proposed model on the improvement of the classification precision and accuracy.

    Citation: Kun Yu, Mingxu Huang, Shuaizheng Chen, Chaolu Feng, Wei Li. GSEnet: feature extraction of gene expression data and its application to Leukemia classification[J]. Mathematical Biosciences and Engineering, 2022, 19(5): 4881-4891. doi: 10.3934/mbe.2022228

    Related Papers:

  • Gene expression data is highly dimensional. As disease-related genes account for only a tiny fraction, a deep learning model, namely GSEnet, is proposed to extract instructive features from gene expression data. This model consists of three modules, namely the pre-conv module, the SE-Resnet module, and the SE-conv module. Effectiveness of the proposed model on the performance improvement of 9 representative classifiers is evaluated. Seven evaluation metrics are used for this assessment on the GSE99095 dataset. Robustness and advantages of the proposed model compared with representative feature selection methods are also discussed. Results show superiority of the proposed model on the improvement of the classification precision and accuracy.



    加载中


    [1] A. K. Shukla, P. Singh, M. Vardhan, A two-stage gene selection method for biomarker discovery from microarray data for cancer classification, Chemometr. Intell. Lab. Syst., 183 (2018), 47-58. https://doi.org/10.1016/j.chemolab.2018.10.009 doi: 10.1016/j.chemolab.2018.10.009
    [2] S. Hautaniemi, O. Yli-Harja, J. Astola, P. Kauraniemi, A. Kallioniemi, M. Wolf, et al., Analysis and visualization of gene expression microarray data in human cancer using self-organizing maps, Mach. Learn., 52 (2003), 45-66. https://doi.org/10.1023/A:1023941307670 doi: 10.1023/A:1023941307670
    [3] J. H. Hong, S. B. Cho, The classification of cancer based on dna microarray data that uses diverse ensemble genetic programming, Artif. Intell. Med., 36 (2006), 43-58. https://doi.org/10.1016/j.artmed.2005.06.002 doi: 10.1016/j.artmed.2005.06.002
    [4] M. Hollstein, D. Sidransky, B. Vogelstein, C. C. Harris, p53 mutations in human cancers, Science, 253 (1991), 49-53. https://doi:10.1126/science.1905840 doi: 10.1126/science.1905840
    [5] T. Latkowski, S. Osowski, Data mining for feature selection in gene expression autism data, Expert Syst. Appl., 42 (2015), 864-872. https://doi.org/10.1016/j.eswa.2014.08.043 doi: 10.1016/j.eswa.2014.08.043
    [6] Y. Wang, F. S. Makedon, J. C. Ford, J. Pearlman, Hykgene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data, Bioinformatics, 21 (2005), 1530-1537. https://doi.org/10.1093/bioinformatics/bti192 doi: 10.1093/bioinformatics/bti192
    [7] W. Hu, W. Hu, S. Maybank, Adaboost-based algorithm for network intrusion detection, IEEE Trans. Syst. Man Cybern. B Cybern., 38 (2008), 577-583. https://doi.org/10.1109/TSMCB.2007.914695 doi: 10.1109/TSMCB.2007.914695
    [8] C. L. Huang, C. J. Wang, A ga-based feature selection and parameters optimizationfor support vector machines, Expert Syst. Appl., 31 (2006), 231-240. https://doi.org/10.1016/j.eswa.2005.09.024 doi: 10.1016/j.eswa.2005.09.024
    [9] A. K. Jain, R. P. W. Duin, J. Mao, Statistical pattern recognition: A review, IEEE Trans. Pattern Anal. Mach. Intell., 22 (2000), 4-37. https://doi.org/10.1109/34.824819 doi: 10.1109/34.824819
    [10] L. Li, T. A. Darden, C. Weingberg, A. Levine, L. G. Pedersen, Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method, Comb. Chem. High Throughput Screening, 4 (2001), 727-739. https://doi.org/10.2174/1386207013330733 doi: 10.2174/1386207013330733
    [11] X. Huang, L. Zhang, B. Wang, F. Li, Z. Zhang, Feature clustering based support vector machine recursive feature elimination for gene selection, Appl. Intell., 48 (2018), 594-607. https://doi.org/10.1007/s10489-017-0992-2 doi: 10.1007/s10489-017-0992-2
    [12] R. Díaz-Uriarte, S. A. De Andres, Gene selection and classification of microarray data using random forest, BMC bioinformatics, 7 (2006), 1-13. https://doi:10.1186/1471-2105-7-3 doi: 10.1186/1471-2105-7-3
    [13] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Mach. Learn., 46 (2002), 389-422. https://doi.org/10.1023/A:1012487302797 doi: 10.1023/A:1012487302797
    [14] L. Vinh, S. Lee, Y. T. Park, B. J. dAuriol, A novel feature selection method based on normalized mutual information, Appl. Intell., 37 (2012), 100-120. https://doi.org/10.1007/s10489-011-0315-y doi: 10.1007/s10489-011-0315-y
    [15] R. Ruiz, J. C. Riquelme, J. S. Aguilar-Ruiz, Incremental wrapper-based gene selection from microarray data for cancer classification, Pattern Recognit., 39 (2006), 2383-2392. https://doi.org/10.1016/j.patcog.2005.11.001 doi: 10.1016/j.patcog.2005.11.001
    [16] S. Szedmak, J. Shawe-Taylor, C. J. Saunders, D. R. Hardoon, Multiclass classification by l1 norm support vector machine, in Pattern recognition and machine learning in computer vision workshop, 5 (2004).
    [17] E. Lotfi, A. Keshavarz, Gene expression microarray classification using PCA-BEL, Comput. Biol. Med., 54 (2014), 180-187. https://doi.org/10.1016/j.compbiomed.2014.09.008 doi: 10.1016/j.compbiomed.2014.09.008
    [18] K. Y. Yeung, W. L. Ruzzo, Principal component analysis for clustering gene expression data, Bioinformatics, 17 (2001), 763-774. https://doi.org/10.1093/bioinformatics/17.9.763 doi: 10.1093/bioinformatics/17.9.763
    [19] L. Sun, W. Wang, J. Xu, S. Zhang, Improved lle and neighborhood rough sets-based gene selection using lebesgue measure for cancer classification on gene expression data, J. Intell. Fuzzy Syst., 37 (2019), 5731-5742. https://doi.org/10.3233/JIFS-181904 doi: 10.3233/JIFS-181904
    [20] L. Sun, J. Xu, W. Wang, Y. Yin, Locally linear embedding and neighborhood rough set-based gene selection for gene expression data classification, Genet. Mol. Res., 15 (2016), 15038990. http://dx.doi.org/10.4238/gmr.15038990 doi: 10.4238/gmr.15038990
    [21] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), 770-778. http://doi.org/10.1109/CVPR.2016.90
    [22] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), 7132-7141. http://doi.org/10.1109/CVPR.2018.00745
    [23] X. Zhao, S. Gao, Z. Wu, S. Kajigaya, X. Feng, Q. Liu, et al., Single-cell rna-seq reveals a distinct transcriptome signature of aneuploid hematopoietic cells, Blood, 130 (2017), 2762-2773. http://doi.org/10.1182/blood-2017-08-803353 doi: 10.1182/blood-2017-08-803353
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2348) PDF downloads(89) Cited by(2)

Article outline

Figures and Tables

Figures(3)  /  Tables(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog