1.
Introduction
Heart failure (HF) is a complicated syndrome associated with cardiac tissue injury, which is usually accompanied by systemic inflammation, arrhythmia and cardiac conduction defects [1]. As a global public health problem with increasing and rapid development, heart failure is the main cause of incidence rate, mortality rate and hospitalization in the world, and is also a major burden of clinical, social and economic burden [2,3,4].
The heart is also immunoactive and contains all major leukocyte populations, either located in the heart or waiting to penetrate into the heart [5,6]. When the heart is ischemic, the death of damaged and necrotic cardiomyocytes will lead to the activation of immune cells in the heart tissue. Macrophages resident in myocardial tissue undergo a series of obvious phenotypic and functional changes to remove dead cells and matrix fragments, lead to the release of cytokines and growth factors, and stimulate the formation of connective tissue and new blood vessels [7].
Recently, microarray technology and integrated bioinformatics analysis have been applied to identify potential key genes associated with different illnesses, which can be further used as biological markers for diagnosis and prognosis [8,9,10]. Machine learning method is widely used to find biomarkers. For example, the team used regularized logistic regression to predict biomarkers of spontaneous preterm birth from gene expression data [11]. CIBERPORT is a common computational method for immune cell identification. CIBERSORT contributes to evaluate the relative subpopulations for RNA transcripts and also to quantify cell components [12]. Further, previous articles suggest that CIBERSORT can precisely estimate immune cell landscape in several forms of malignant tumors, such as hepatocellular carcinoma [13] and bladder cancer [14].
In addition to cancer, more and more studies began to focus on the role of immune cell infiltration in heart failure. In this study, two machine-learning methods were employed to explore and identify the key genes of patients associated with heart failure and to analyze the immune cell infiltration. Then further evaluate the correlation between immune cell infiltration and dominant gene in heart failure tissues, so as to provide new research ideas for the treatment and early detection of heart failure.
2.
Materials and methods
2.1. Data download and dentification of differentially expressed genes (DEGs)
The series of matrix files of the GSE3585 and GSE120895 datasets were obtained from the GEO database (http://www.ncbi.nlm.nih.gov/geo/). The GSE3585 was based on the GPL96 platform of Affymetrix Human Genome U133A Array. The GSE120895 was based on the GPL570 platform of Affymetrix Human Genome U133 Plus 2.0 Array. The GSE3585 dataset included 7 HFs and 5 controls collected from left ventricular tissue samples, whereas the GSE120895 dataset included 47 HFs and 8 controls collected from the heart tissues. GSE9800 dataset was used for verification, and also from GEO database. 12 HFs and 11 controls were obtained from GSE9800. The GSE9800 was based on the GPL887 platform of Agilent-012097 Human 1A Microarray (V2) G4110B.
According to the corresponding probe annotation files, the probe IDs of every dataset were converted to gene symbols. We evaluated probe average to be the ultimate gene level as to multiple probes in line with the same gene symbols. In addition to this, subsequent integration analyses were conducted with the two datasets, which were integrated as one metadata cohort. Moreover, this work adopted R software "SVA" package for removing batch effect [15]. To establish DEGs between the HF and the control group, the "limma" package was selected for R software [16]. Samples with an adjusted false discovery rate P < 0.05 and |log fold change (FC)| ≥ 1 were considered as the threshold points for differentially expressed genes. Volcano map and their corresponding heatmaps were plotted with the ggplot2 R package (https://ggplot2.tidyverse.org) in combination with heatmap R package (https://CRAN.R-project.org/package=pheatmap).
2.2. Functional enrichment of DEGs
R software and the "clusterProfiler" package [17] were used to conduct Gene Ontology (GO) analysis. First symbol codes were converted to Entrez ID using Human genome annotation package "org.Hs.eg.db". Disease ontology (DO) enrichment analyses were performed on DEGs using the "clusterProfiler" and DOSE packages in R [17,18]. We used the "ggplot2" [19] of R software to visualize the plots. Adjusted P-value < 0.05 and Q value < 0.05 were considered significant for GO analysis and DO enrichment analyses.
2.3. Candidate diagnostic biomarker screening and diagnostic value of feature biomarkers
The Least Absolute Shrinkage and Selector Operation (LASSO) applies regularization in improving prediction accuracy. Besides, the "glmnet" package was adopted for analyzing LASSO regression [20]. Support vector machine (SVM) represents the supervised machine-learning technology commonly used in regression and classification. For avoiding overfitting, we selected RFE algorithm for choosing those best genes in meta-data cohort. Thus, Support Vector Machine-Recursive Feature Elimination (SVM-RFE) was conducted for analyzing proper characteristics aiming to recognize the gene set having the maximum discriminative power [21]. The "e1071" package was applied in performing the SVM algorithm [22]. We defined those overlapped genes of these two algorithms as hub genes.
This study produced the receiver operating characteristic (ROC) curve with hub gene expression data for investigating whether those discovered biomarkers were of prediction value. We used the area under curve (AUC) for determining the efficiency in differentiating between HFs and control samples and further validated in the GSE9800 dataset.
2.4. Discovery of immune cell subtypes
We applied CIBERSORT (https://cibersortx.stanford.edu/) in immune cell infiltration calculations. We forecasted the possible levels of immune cells with the reference set containing 1000 permutations and 22 immune cell subtypes. We created violin plots with R software "vioplot" package for visualizing different infiltration degrees of immune cells. Meanwhile, CorrPlot package was used to produce a correlation heat map. Adjusted-P value < 0.05 suggested the threshold value for statistical significance of the relevant cell type.
2.5. Correlation analysis of immune cells with diagnostic biomarker
The association of immune cells with featured genes of R software was analyzed by Spearman's rank correlation analysis and "ggplot2" package was used for data visualization. P-value < 0.05 suggested the threshold value for being statistically significant.
2.6. Construction of heart failure model and quantitative real-time polymerase chain reaction (qRT-PCR)
The animal experiment had passed the animal ethics Association of Ningbo University, and the ethics number is 11118. C57BL/6 mice were induced to heart failure through ligation of left anterior descending coronary artery. The control group was the sham operation group, that is, treated with the same surgery without tying the left anterior descending coronary artery. The pale color of the affected myocardium during the procedure confirmed the success of left anterior descending coronary artery ligation. The concrete method was performed as previously reported [23,24]. After 4 weeks, the mice were sacrificed and RNA were extracted from heart tissues by using Trizol reagent (America Invitrogen).
RNA was reverse transcribed into cDNA using primescript one step RT-PCR Kit (Japan Takara). Then cDNA was generated and quantitative PCR was performed by using the SYBR Green (America sigma). All primers used for quantitative PCR were showed as Table 1. The expression levels of heart failure markers ANP, BNP and β-MHC were measured by qRT-PCR. When these markers increase, it proves that our heart failure model is successful [25]. GAPDH served as internal control. Expression levels of all genes were analyzed by using 2-ΔΔCt method.
3.
Results
3.1. Identification of DEGs in HF
Retrospective analyses were performed on data from 54 HFs and 13 control samples in GSE3585 and GSE120895 datasets. After removing the batch effects, 24 DEGs were collected (Figure 1A), among which, 20 showed up-regulation and 4 showed down-regulation (Figure 1B).
3.2. Enrichment analysis of DEGs
GO analysis and DO enrichment analyses were conducted to annotate and investigate the biological functions of the DEGs.
The GO project divided into three categories: cellular component, biological process and molecular function. We observed that DEGs were mainly enriched in biological process, including regulation of systemic arterial blood pressure, vascular process in circulatory system, positive regulation of heart contraction and positive regulation of blood circulation. In molecular function, DEGs were mainly enriched in G-protein alpha-subunit binding, hormone receptor binding and metalloexopeptidase activity. However, in terms of cellular components, there were no enriched categories (Figure 2A).
DO pathway enrichment analyses indicated that diseases enriched by DEGs were mainly associated with intrinsic cardiomyopathy, cardiomyopathy, hypertrophic cardiomyopathy, atrial heart septal defect, dilated cardiomyopathy, heart septal defect and congestive heart failure (Figure 2B).
The enrichment results of DEG further proved the important role of these DEG in the process of heart failure.
3.3. Diagnostic biomarker discovery
Then, we further look for diagnostic biomarkers from the 24 DEGs obtained by the following two methods. Using the LASSO regression algorithm and SVM-RFE algorithm, 10 variables were found as diagnostic biomarkers of HF by using the LASSO regression algorithm (Figure 3A), and 6 features of the DEGs were recognized (Figure 3B) by SVM-RFE algorithm.
Finally, four overlapping features (NSG1, NPPB, PHLDA1, and SERPINE2) of the two algorithms were selected (Figure 3C).
Additionally, ROC analysis was adopted for further evaluation of the predictive value for four characteristic genes. The AUC values were 0.91 for NPPB (Figure 4A), 0.927 for NSG1 (Figure 4B), 0.925 for PHLDA1 (Figure 4C) and 0.899 for SERPINE2 (Figure 4D). It showed that the characteristic biomarkers have a high diagnostic ability.
3.4. Immune infiltration analysis
Differences between controls and HF tissues concerning immune cell infiltration were analyzed with 22 subpopulations of immune cells based on the CIBERSORT algorithm. As shown in Figure 5A, the percentage for 22 immune cells has been displayed. The proportion of macrophages M1 and activated mast cells in heart failure tissues decreased, as shown in the vioplot (Figure 5B).
Then we analyzed the correlation of 22 immune cells. The correlation heatmap of immune cells also revealed the relationship between different immune cells (Figure 6). The Figure 6 showed M0 Macrophages were observed to be in a positive correlation with activated memory CD4T cells (R = 0.96) and M1 macrophages (R = 0.58), while naive B cells were negatively correlated with memory B cells (R = -0.82).
3.5. Association between diagnostic biomarker and immune cells
According to Figure 7A, NPPB showed positive relation to T follicular helper cells (R = 0.28, p = 0.022) but a negative correlation was found with resting dendritic cells (R = -0.33, p = 0.006). PHLDA1 showed positive relation to T follicular helper cells (R = 0.26, p = 0.031) and M0 Macrophages (R = 0.26, p = 0.032), but a negative correlation with the remaining Dendritic cells (R = -0.27, p = 0.028) (Figure 7B) was observed. SERPINE2 and Plasma cells were positively associated with each other (Figure 7C). No correlation was seen between NSG1 and immune cell infiltration (R = 0.34, p = 0.0055) (Figure 7D).
3.6. Validation of diagnostic biomarkers
Finally, in order to produce more accurate and reliable results, GSE9800 dataset is used to verify the expression of these four characteristic genes. The expression levels of NSG1 in heart failure tissues were significantly lower than those in the control group (Figures 8A, B; P < 0.05), and the expression levels of NPPB and PHLDA1 in heart failure tissues were significantly higher than those in the control group (Figure 8C; P < 0.05). SERPINE2 was up-regulated in the heart failure group, but there was no significant difference in SERPINE2 (Figure 8D; P > 0.05).
The diagnostic ability of other three biomarkers in discriminating HFs from the control samples in GSE9800 dataset demonstrated a favorable diagnostic value. The AUC of NSG1 was 0.848 (95% CI 0.667-0.970) (Figure 8E), NPPB was 0.784 (95% CI 0.572-0.981) (Figure 8F), PHLDA1 was 0.833 (95% CI 0.621-0.985) (Figure 8G), and the AUC of SERPINE2 was 0.720 (95% CI 0.492-0.909) (Figure 8H).
Then we constructed the heart failure model in mice, and obtained the heart tissue of heart failure mice. The heart failure markers ANP, BNP and β-MHC increased significantly in the heart failure model group(Figures 9A-C; P < 0.05), indicating the success of our heart failure model. Compared with the control group, we observed a significant increase in the expression levels of NSG1, NPPB, PHLDA1 and SERPINE2 in heart failure tissues (Figures 10A-D, P < 0.05).
4.
Discussion
Gene expression differences exert significant effects both on heart failure diagnosis and therapy. During the previous decades, microarray technology and bioinformatics analysis have been extensively used to screen gene changes at the genomic level, to propose novel personalized therapies, and to explore the functions of associated biomolecules.
Support vector machine is one of the important tools of machine-learning [22]. It has excellent performance in classification and prediction, and is widely used in disease diagnosis or medical assistance. However, at first, the application of SVM in analyzing biomedical data was limited. The proposal of SVM-RFE enables us to accurately select variables and explain the direction and intensity of the correlation between predictors and results when analyzing biomedical data [26]. LASSO is a regression-based method, which allows the use of a large number of covariates in the model and has the unique characteristics of punishing the absolute value of regression coefficient [27]. The traditional regression technology is widely limited in analyzing and synthesizing a large number of covariates. Therefore, LASSO has been more and more widely used in biomedicine [28].
Here, a combinatorial assay of the lasso regression model and SVM-RFE algorithm was adopted to select candidate biomarkers with the highest value for diagnosis: NSG1, NPPB, PHLDA1, and SERPINE2.
In addition to NSG1, almost all genes had been well verified. There are two probable reasons for the increase of NSG1 in the heart failure mice model. One is that the species are different. In the GEO database, we choose human samples, but we use mice to simulated heart failure model. Second, we use the heart failure model caused by myocardial infarction in mice, but there are many reasons for heart failure. For example, studies have used single-cell RNA-sequencing to reveal the cell types involved in heart failure, and researchers have found the potential difference between ischemic cardiomyopathy and dilated cardiomyopathy in heart failure. For example, ischemic cardiomyopathy has more specific immune cell types than dilated cardiomyopathy, such as macrophages and dendritic cells [29]. But in general, NPPB, PHLDA1 and SERPINE2 have been repeatedly certified in the experimental results, which further verifies the reliability of our model.
SERPINE2 is a serine protease inhibitor, a 44 kDa to 50 kDa glycoprotein, and a key regulator of biological processes inside and outside cells [30]. According to the current research, SERPINE2 is expressed in many organs, including the heart [31]. A recent observation shows that SERPINE2 is closely related to cardiac tissue fibrosis. The researchers observed the increased expression of SERPINE2 in pathological cardiac fibrosis and found that SERPINE2 caused cardiac fibrosis mainly by increasing collagen deposition, and ERK1/2 signaling pathway played an important role in this process [32]. Another study also found that aspirin can reduce cardiac interstitial fibrosis by inhibiting ERK1/2-SERPIN signaling pathway [33]. PHLDA1, known as Pleckstrin homology-like domain, family A, member 1, was first identified in T cell receptor activation induced apoptosis [34]. Up to now, many studies have found that phlda1 is closely related to a variety of heart diseases, such as; Ischemic cardiomyopathy [35]ㄛdilated cardiomyopathy [36]ㄛmyocardial damage [37]. Similarly, through bioinformatics studies, we observed that PHLDA1 was highly expressed in heart failure tissues and was significantly corelated with T follicular helper cells. PHLDA1 is expected to become a new marker of heart failure.
In order to further explore the infiltration of immune cells in heart failure tissues and the role of these four genes in the pathogenesis of heart failure. Using CIBERSORT, we observed that M1 macrophages and activated mast cells infiltrated differently in heart failure tissues and normal tissues. In addition, NPPB and PHLDA1 showed positive relation to T follicular helper cells. SERPINE2 was positively associated with plasma cells.
As far as we know, mast cells are one of the resident cell groups in heart tissue, which may play an important role in the progression of myocardial infarction. Stimulated by a series of inflammatory factors, mast cells release TNF and histamine, triggering a series of inflammatory reactions after myocardial infarction [38].
Macrophage population is also one of the resident cell groups in myocardial tissue [39]. Heart failure is generally accompanied by cardiac inflammation and subsequent tissue damage. Macrophages are activated and infiltrated into the myocardium, coordinate with other immune cells, and become a key regulator of tissue repair or regeneration [40]. However, when infiltrating macrophages are over stimulated by inflammatory cytokines, the macrophages will have functional disorder, which can then lead to abnormal repair and sustained injury of cardiomyocytes, and finally progress to heart failure [40].
CIBERSORT results showed that when heart failure occurs, various types of immune cells are infiltrating into myocardial tissue, and the immune cells infiltrating into cardiac tissue play a complex mechanism. However, this study had some limitations and requires further experimental data, such as confirmation of immunophenotype by immunohistochemistry.
5.
Conclusions
In short, our study identified four key genes through two machine-learning methods. In the future, it is possible to distinguish patients with heart failure from patients without heart failure by detecting the expression of these four genes. At the same time, combined with CIBERPORT, the identified immune cells may be developed as the target of immunotherapy for patients with heart failure, providing new insights for the diagnosis and treatment of patients with heart failure.
Acknowledgments
This study was supported by the Natural Science Foundation of Zhejiang Province (LY20H020001). Thanks for the support of "Yinzhou district science and technology project in the field of agricultural and social development" with NO.20201YZQ010102, NO.20211YZQ010088 and 20221YZQ070021.
Conflict of interest
All authors declare no conflicts of interest in this paper.