1.
Introduction
Hepatocellular carcinoma (HCC) is amongst the most prevalent and deadly malignancies globally and has been ranked 6th in occurrence and 3rd in fatality rate [1]. Even though there has been some advancement in prevention, surveillance, early-HCC diagnosis and therapy over the past few decades [2,3,4,5], HCC is still a malignant tumor with a dismal prognosis as evidenced by a limited five-year survival probability of as low as 18% [6]. Most HCC patients receive their diagnosis at intermediate or advanced stages and lose the opportunity to benefit from surgery intervention [7]. Moreover, within the first five years following surgery, around 70 percent of patients experience recurrence or distant metastasis [8]. The traditional TNM stage acts as a common prognostic marker and determines treatment choices for HCC patients. However, HCC is a highly heterogeneous disease both at the molecular and clinical levels [9]. And genetic heterogeneity is closely related to prognosis and therapeutic reactivity [10]. In this sense, the traditional TNM staging system remains to be further improved. Therefore, it is important to construct an effective prognostic prediction model to identify patients with poor prognosis to optimize individualized and rational adjuvant treatments.
Over the past few years, the advent of immune checkpoint blocking treatment has heralded in a new epoch of cancer treatment. So far, three immune checkpoint targeted agents have been approved by the Food and Drug Administration for treating HCC including nivolumab (anti-PD-1), pembrolizumab (anti-PD-1) and ipilimumab (anti-CTLA-4) [11]. Recently, the IMbrave150 trial, evaluating the efficacy of atezolizumab in combination with bevacizumab versus sorafenib in patients suffering from the advanced hepatocellular carcinoma, has demonstrated significant benefits of this combination on overall and progression-free survival [12]. Some studies showed that PD-L1 expression in HCC could regulate several pathways in hepatocellular carcinogenesis, such as inducing inactivation of CD8+ T cells adhered to liver sinusoid cells to promote immune tolerance, inducing the exhaustion of follicular helper T cells, impairing cytokine expression and B cell function [13,14,15]. Itoh et al. [16] reported that HCC patients having PD-L1 positive expression had a worse prognosis as opposed to the ones having negative PD-L1 expression. Still other studies have pointed out that immune checkpoint-associated receptors and ligands, such as TIM-3 [17], TIGIT [15], LAG-3 [18] and GAL-9 [19] are involved in HCC progress and may serve as prognostic indicators. However, there are few research have systematically studied the expression profile of genes associated with immune checkpoints and the prognostic significance of these genes in HCC patients. Further, it is noteworthy that the predictive value of a single immune checkpoint gene in prognosis or treatment response seems to be limited. In a meta-analysis conducted by Shen et al. [20] to assess the effectiveness of PD-1 or PD-L1 inhibitors in patients with PD-L1 positive and PD-L1 negative cancers, it was discovered that the expression of PD-L1 was not a sufficient predictor of responsiveness to immunotherapy.
In the present research, taking the advantage of genomic databases, we focused on identifying prognosis-related immune checkpoint genes and constructing an ICG-based gene signature, followed by the validation of its robustness through an external patient cohort. Moreover, immune status and drug sensitivity were compared between the high- and low-risk cohorts. We expected this novel gene signature might help to refine risk assessment and promote the development of effective individualized therapy in clinical settings.
2.
Materials and methods
2.1. Data sources and processing
The original RNA-sequence data along with matching clinical data were downloaded from the Cancer Genome Atlas (TCGA; https://portal.gdc.cancer.gov/) database, including 374 HCC tissues and 50 adjacent non-cancerous tissues (There were 370 HCC patients who had comprehensive follow-up information), which were utilized to create a prognostic model. Meanwhile, level 3 mRNA expression data together with clinical features of additional 231 HCC samples were downloaded from the International Cancer Genome Consortium (ICGC; https://icgc.org/icgc) database and were employed for validation. All of the data that was utilized in this investigation was accessible to the public. The gene symbol was first transformed from the probe IDs by referring to the annotation files. From the related literature, an aggregate of 79 ICGs was identified, with the majority of the ICGs being receptors, ligands, or other key components in immune checkpoint pathways [21,22,23,24] (Table S1). A subsequent analysis was conducted using the expression matrix of the ICGs that had been identified.
2.2. DE-ICG identification and enrichment analysis
The "limma" R package was employed to search for differential expression of immune checkpoint-related genes (DE-ICGs) across tumor and surrounding tissues in TCGA database. The |log2 fold change| > 1 and P-value < 0.05 served as the screening threshold. The "ggplot2" and "pheatmap" packages were employed to create heatmap and volcano plots. Then, the DE-ICGs were subjected to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses by means of the R package "clusterProfiler". The GO terms included 3 biological categories (MF: molecular functions, CC: cell components and BP: biological process). Utilizing the R package "ggplot2", we built a representation of the 10 leading highly enriched GO terms and KEGG pathways.
2.3. Creation and verification of a prognostic gene signature associated with ICG
Firstly, univariate Cox analysis was undertaken to filter out ICGs substantially linked to overall survival (OS). To further narrow the candidate genes range, the LASSO-Cox regression analysis was applied to determine the independent OS predictors using the "glmnet" R package. The genes with independent prognostic values were incorporated into the gene signature. These hub genes were also validated in the Kaplan-Meier (K-M) Plotter database (www.kmplot.com). Next, these genes were employed to construct a prognostic gene signature. Each patient's risk score was computed utilizing the equation below: Risk score = ∑ni=1Coefi∗Xi (where Coefi was coefficient calculated by multivariate Cox proportional hazard regression, while Xi represented the expression level for each ICG). Following that, subjects were categorized into low- and high-risk cohorts depending on their median risk rating. Using the "survival" R package, a survival analysis was carried out to examine the variation in OS between the two cohorts. After generating the ROC curves, we computed the area under the curve (AUC) to measure the prediction ability. For external validation, the same methods were adopted to estimate the gene's signature precision in the ICGC cohort. In addition, considering that disease-specific survival (DSS) and disease-free survival (DFS) could more specifically indicate the clinical benefit to patients, we additionally explored the prognostic significance of the established gene signature using DFS and DSS as clinical outcomes in the TCGA-LIHC cohort.
2.4. Correlation analysis between gene signature, ICGs and clinical traits
To thoroughly investigate the gene signature's prognostic significance, we conducted subgroup analyses to clarify the clinicopathological features, gene expression patterns and OS of patients in distinct risk cohorts. In addition, survival analyses were also conducted according to patients' clinical information like age, gender, tumor grade and clinical stage.
2.5. Establishment and assessment of a predictive nomogram
The gene signature, in conjunction with the other clinical parameters listed above, was subjected to univariate and multivariate Cox regression analysis in TCGA cohort in an attempt to discover independent prognostic markers. The "rms" R platform was utilized to create the nomogram from all independent predictors with a P-value < 0.05. In order to visualize the precision of the prediction nomogram, we generated calibration curves plus time-dependent ROC curves. Finally, when it comes to clinical applications, decision curve analysis (DCA) was utilized to indicate the nomogram's potential value.
2.6. Immune status and therapeutic response analysis
To additionally examine the link between risk scores and immune status of patients, a single-sample genomic enrichment analysis (ssGSEA) was carried out to contrast the infiltration scores of 16 distinct immune cell subpopulations and 13 immune functional pathways among both risk cohorts utilizing "GSVA" package in the R software. To forecast the possible responsiveness to an immune checkpoint inhibitor, we applied an additional tool called Immunophenoscore (IPS). IPS is an extensively utilized machine learning-based algorithm, reflecting the tumor immunogenicity of patients [25]. It was computed on a scale ranging from 0 to 10, with higher scores being related to greater immunogenicity and improved immunotherapeutic responsiveness. The Cancer Immunome Atlas (TCIA, https://tcia.at/home) database was utilized to retrieve the IPS of patients, which were subsequently subjected to a comparison between the two risk cohorts. In addition, we also compared the sensitivity of five common chemotherapy drugs (sorafenib, doxorubicin, cisplatin, vinblastine and mitomycin. C) between the two groups, which were recommended by AJCC guidelines for the adjuvant therapy of liver cancer. The half-maximal inhibitory concentration (IC50) values were computed for drug resistance measurements. Results of high- and low-risk cohorts were visualized utilizing box plots created by using the R package "pRRophetic" and "ggplot2".
3.
Results
3.1. Patient demographics and clinical characteristics
A total of 374 HCC patients from the TCGA-LIHC cohort and 231 HCC patients from the ICGC (LIRI-JP) cohort were eventually included in the present research. Among the patients in the TCGA cohort, four patients were excluded from survival analysis due to incomplete follow-up information. Table 1 summarizes the comprehensive clinical parameters of the patients in more depth. The study's workflow is outlined in Figure 1.
3.2. Differentially expressed genes and functional enrichment analysis
Among the 79 ICGs, 68 genes were identified from the TCGA dataset. From these genes, we extracted the expression profiles and collated them into the expression matrix. Based on the screening criteria, a total of 21 genes showed significant alteration in the levels of expression in HCC as opposed to the adjoining non-cancerous tissues, including 18 up-regulated and 3 down-regulated. Heatmap (Figure 2A) and volcano plot (Figure 2B) showed the distribution of DE-ICGs. Furthermore, in order to better understand the biological role of DE-ICGs, a functional enrichment analysis was performed. As shown in Figure 2C, the bar plot showed the top 10 enriched GO terms, including: "MHC protein complex binding", "external side of plasma membrane" and "T-cell activation". Premised on the findings of the KEGG pathway analysis, the identified DE-ICGs were related to "Cytokine-cytokine receptor interaction" and "Antigen processing and presentation" (Figure 2D, E).
3.3. Construction of an ICG-based prognostic gene signature
We firstly conducted a univariate Cox regression analysis and discovered eight genes of significant correlation with OS, among which six genes were protective factors and two genes were risk factors (Figure 3A). Of note, not all OS-related ICGs were DE-ICGs. Figure 3B depicts the relationship between these genes and their expression. Considering these genes may interact with each other, we employed Lasso Cox regression analysis to lower the dimensionality of data in order to find the real OS-affecting factors (Figure 3C, D). Six genes were selected as hub genes including LGALS9, CD209, CD40LG, SIRPA, BTNL9 and TNFSF4. The coefficients of these hub genes were depicted in Figure 3E. Detailed information was summarized in Table 2. Besides, the prognostic values of these six hub genes were also confirmed by the Kaplan-Meier plotter database (Figure S1). Subsequently, the following equation was established for this six-gene signature: Risk score = 0.002 * ExpLGALS9 + 0.089 * ExpCD209 - 0.381 * ExpCD40LG + 0.019 * ExpSIRPA - 0.07 * ExpBTNL9 - 0.046 * ExpTNFSF4. Finally, we were able to determine each patient's risk score utilizing the equation provided above.
3.4. The performance of gene signature
In the TCGA cohort, the median risk value of all HCC patients was utilized to separate them into high- and low-risk cohorts, with a cutoff value of 0.1 to distinguish between the two. First, we charted the survival curves to probe into whether there was a substantial difference in survival between the two cohorts. As illustrated in Figure 4A, patients at high risk demonstrated a considerably reduced OS as opposed to the ones at low risk. Figure 4B illustrated the risk score distribution as well as the survival status of patients. Next, in order to assess the gene signature's predictive ability even further, ROC curves for survival over one, two and three years were created and the AUC was determined for each of these outcomes. The AUC values as depicted in Figure 4C were 0.679, 0.649 and 0.662, correspondingly. Moreover, an external dataset retrieved from the ICGC database was employed to validate this gene signature. In the validation cohort, high-risk patients exhibited a substantially unfavorable outcome as opposed to the ones at low risk (Figure 4D, E). Similar findings were achieved in the ICGC cohort, whereby, high predictive values of AUC were observed in the time-dependent ROC curves (Figure 4F).
Compared with OS, DFS and DSS are more precise indicators reflecting clinical benefits of patients. This study employed DSS and DFS as clinical endpoints in order to determine if the prognostic model was also associated with these 2 prognostic markers. As illustrated in Figure S2, this prognostic model still could be used to distinguish patients with different survival statuses and held good stability. The results described above suggested that the novel gene signature was a robust index, which might be applied to anticipate clinical outcomes of HCC patients.
3.5. Clinical relevance investigation and subgroup analysis
To study the link between risk cohorts and clinical parameters such as grade, TNM stage, gender and age, we plotted a heat map to display the correlation among these indexes. Figure 5A depicts the distribution of expression patterns of six genes, clinical parameters and survival status. On the basis of these data, it is evident that higher risk scores had a high likelihood of being detected in those patients with advanced TNM stages or higher pathological grade (Figure 5B, C), hinting at the risk signature's precision. Furthermore, survival analysis among distinct clinical subgroups was carried out according to the above different parameters. The results showed that this gene signature could effectively distinguish samples into low- or high-risk cohorts between age, gender, stage as well as grade (Figure 5D–K).
3.6. Development of a nomogram for clinical application
The findings from a univariate Cox regression analysis indicated that risk score (P < 0.001) and stage (P < 0.001) were substantially linked to unfavorable clinical outcomes (Figure 6A). Subsequently, we examined whether the risk score independently served as a reliable prognostic indicator. As shown in Figure 6B, risk score (P < 0.001) and stage (P < 0.001) independently acted as prognostic markers revealed by the multivariate analysis. We included these variables and constructed a nomogram (Figure 6C), aiming at providing individualized survival estimation for HCC patients. The calibration curves (Figure 6D–F) illustrated that the established model has an extremely high consistency with the ideal model. Besides, compared with the pathologic stage, higher prediction accuracy was found in the integration nomogram, exhibiting the largest AUC value (Figure 6G–I). More importantly, the DCA curves demonstrated that the nomogram exhibited larger net benefits, suggesting that it had more therapeutic value for clinical application (Figure 6J–L).
3.7. Assessment of tumor immune microenvironment
To examine the indicative roles of the gene signature on the tumor immune microenvironment, we applied the ssGSEA algorithm to measure the enrichment scores of a variety of immune cell subpopulations, associated pathways, or functions. As anticipated, highly infiltrated immune cells, including NK cells, CD8+ T cells and B cells, were found in the low-risk cohort (Figure 7A). Meanwhile, immune signaling pathways such as cytolytic activity, IFN response had an elevated score in the low-risk cohort (Figure 7B). Lollipop plot (Figure 7C) further showed that risk scores were negatively related to immune effectors cells including NK cells, CD8+ T cells, B cells and tumor-infiltrating lymphocytes. This result was in accordance with the conclusion that patients at high risk might exhibit unfavorable responsiveness to immunotherapy. Additionally, we examined the prognostic significance of these immune effector cells, among which higher infiltration levels of CD8+ T cells, B cells, NK cells and TIL were correlated with the better OS (Figure 7D–G).
3.8. Potential of the risk score as an indicator reflecting therapeutic response
In previous studies, IPS has been proved to be reliable for predicting immunotherapy response [26,27]. We found that the scores of IPS, IPS-CTLA4, IPS-PD1/PD-L1/PD-L2, IPS-PD1/PDL1/PD-L2 + CTLA4 were elevated among low-risk patients (Figure 8A–D), suggesting these patients had a high likelihood of responding to inhibitors of immune checkpoint. In addition to the immunotherapy, we next sought to explore the link between risk scores and the effectiveness of chemotherapy. The IC50 of five common chemotherapeutic drugs was computed via the pRRophetic algorithm. Figure 8E–I illustrated that higher risk scores were linked to a greater IC50 for doxorubicin (P = 0.015) and mitomycin c (P < 0.001).
4.
Discussion
HCC is troublesome for having a high risk of relapse and a dismal prognosis. Thus, biomarkers correlated with HCC incidence, progression and prognosis must be unraveled to improve the clinical outcomes of patients. Although a variety of tumor markers have been widely used in clinical practice, only one biomarker might not be reliable enough to anticipate long-term outcomes or treatment responses. In contrast, the gene signatures could achieve better predictive performance, facilitating clinicians to judge the prognosis of patients accurately and distinguish subgroups to select appropriate treatments.
In recent years, immunotherapies that target programmed cell-death-1 receptor (PD-1) or its ligand (PD-L1) have gained substantial interest in the HCC treatment. The CheckMate-040 [28] and Keynote-224 [29] studies showed that OS of advanced HCC patients receiving nivolumab and pembrolizumab as second-line treatment was 15.6 months and 12.9 months, respectively. Compared with the regorafenib treatment (OS, 10.6 months), there was a significant increase. Growing evidence suggested that immune checkpoint blockade is one of the most promising approaches in HCC treatment. However, it was also worth noting that the overall response rate of ICIs was only 10%–20% [28]. Hence, there is an unmet need to identify biomarkers guiding personalized treatment to effectively utilize medical resources and prevent adverse events caused by overtreatment. Several other immune checkpoint molecules in addition to PD-L1, which encompass TIGIT, TIM-3, LAG-3 and CTLA-4, have been proposed to participate in the immunomodulation of tumor microenvironment [30]. However, their exact prognostic value is controversial and remains to be further explored. Meanwhile, ICG-based prognostic models are still lacking in clinical practice.
So, in the current study, we conducted a literature review from which we collected 79 immune checkpoint genes and extracted the gene expression profiles in the TCGA database. OS-related ICGs were firstly identified in the univariate Cox regression analysis. Subsequently, we utilized Lasso Cox regression to minimize overfitting and constructed an optimal six-gene prognostic model. The prognostic model enabled us to classify all patients into two prognostic cohorts based on their risk ratings. The high-risk cohort exhibited a significant worse prognosis. Furthermore, the model's predictive capacity was demonstrated by ROC curve analysis, which was satisfactory. Meanwhile, the above results were also externally validated in the ICGC cohort. Interestingly, we found that Zhao et al. [31]. developed a similar predictive model based on PD-L1 and CTLA-4 signaling pathways genes. We plotted a time-dependent ROC curve to contrast the precision of the two models and the findings illustrated that our model exhibited a higher AUC index, reflecting its better prediction efficiency (Figure S3). Besides, our model is suitable for different clinical endpoints, like DSS and DFS, facilitating clinicians to make more comprehensive evaluations of patients' prognosis. Eventually, we created a nomogram incorporating risk signature and TNM stage to enhance clinical application values.
Our risk signature is composed of six genes, including LGALS9, CD209, CD40LG, SIRPA, BTNL9 and TNFSF4. The results showed that CD40LG and BTNL9 were protective factors, while the others were risk factors for HCC. Galectin 9, encoded by LGALS9 gene, is a ligand for TIM-3 and participates in immune tolerance [32]. The galectin-9 expression level was correlated with cancer progression and aggressiveness in diverse types of cancers [33,34], such as HCC [35]. Similar to our findings, Sideras et al. [19] found that the galectin-9 was associated with poor HCC-specific survival. CD40LG affiliated to the tumor necrosis factor gene superfamily, performs a pivotal function in T cell-dependent humoral responses through an interface with its classical receptor CD40 [36]. It has been suggested that CD40LG-expressing dendritic cells could suppress HCC progression by activating immune system in vivo [37]. SIRPA was related to the phenotype switch of tumor-polarized macrophages and served important functions in HCC progression [38]. Regrettably, the remaining three genes were identified and reported to be associated with the progression as well as the prognosis of the other tumors [39,40,41], but have never been reported in HCC. This remains to be explored in further studies.
Previous studies have confirmed that immune infiltration is a key factor affecting the clinical benefits of immunotherapy [42]. Our findings suggested that the immune cells infiltration levels were strongly associated with the gene signature as well as the prognosis of patients. Highly infiltrated CD8+ T cells, NK cells and B cells were found in the low-risk cohort and linked to improved prognosis. Kesh et al. [43] pointed out that elevated infiltration of CD8+ T cells may sensitive tumors to respond to immunotherapy. It was also reported that NK cells could cooperate with CD8+ T cells to improve therapeutic response in patients receiving CTLA-4 blockade therapy [44]. Furthermore, based on infiltrated immune subsets and immunomodulatory molecules, Charoentong et al. [25] applied the machine learning algorithm to develop the Immunophenoscore. It could be used to predict the therapeutic response to anti-CTLA-4 and anti-PD-1 therapy. In our study, low risk patients had a higher IPS, suggesting that they were more likely to have higher immunogenicity. Thus, the current findings may have important implications for the identification of patients with a high likelihood of benefiting from immunotherapy.
Nonetheless, our study had some limitations. Firstly, some important clinical parameters, determining the prognosis, are not available in the TCGA database, such as the history of hepatitis or liver cirrhosis. Secondly, although we performed an external validation in the ICGC cohort, the conclusion would be more convincing with validation in a prospective, large-sample cohort. Furthermore, the biological activities of these hub genes in the prognostic signature remain to be further explored in experiments.
5.
Conclusions
In conclusion, we established and verified a new ICG-related gene signature for HCC based on six hub genes, which could possibly function as a prognostic marker for clinical outcomes and therapeutic response.
Acknowledgements
This study was funded by National Key Research and Development Program of China (2020YFA0710803) and National Natural Science Foundation of China (No. 81770569, 81870421 and 81773072).
Conflict of interest
The authors declare that they have no competing interest.
Data availability statement
The data that support the findings of this study could be downloaded from TCGA (https://portal.gdc.cancer.gov/) databases and ICGC (https://dcc.icgc.org/repositories) database.