
Citation: David F. Anderson, Tung D. Nguyen. Results on stochastic reaction networks with non-mass action kinetics[J]. Mathematical Biosciences and Engineering, 2019, 16(4): 2118-2140. doi: 10.3934/mbe.2019103
[1] | Jingren Niu, Qing Tan, Xiufen Zou, Suoqin Jin . Accurate prediction of glioma grades from radiomics using a multi-filter and multi-objective-based method. Mathematical Biosciences and Engineering, 2023, 20(2): 2890-2907. doi: 10.3934/mbe.2023136 |
[2] | Hakan Özcan, Bülent Gürsel Emiroğlu, Hakan Sabuncuoğlu, Selçuk Özdoğan, Ahmet Soyer, Tahsin Saygı . A comparative study for glioma classification using deep convolutional neural networks. Mathematical Biosciences and Engineering, 2021, 18(2): 1550-1572. doi: 10.3934/mbe.2021080 |
[3] | Sonam Saluja, Munesh Chandra Trivedi, Shiv S. Sarangdevot . Advancing glioma diagnosis: Integrating custom U-Net and VGG-16 for improved grading in MR imaging. Mathematical Biosciences and Engineering, 2024, 21(3): 4328-4350. doi: 10.3934/mbe.2024191 |
[4] | Yutao Wang, Qian Shao, Shuying Luo, Randi Fu . Development of a nomograph integrating radiomics and deep features based on MRI to predict the prognosis of high grade Gliomas. Mathematical Biosciences and Engineering, 2021, 18(6): 8084-8095. doi: 10.3934/mbe.2021401 |
[5] | Sonam Saluja, Munesh Chandra Trivedi, Ashim Saha . Deep CNNs for glioma grading on conventional MRIs: Performance analysis, challenges, and future directions. Mathematical Biosciences and Engineering, 2024, 21(4): 5250-5282. doi: 10.3934/mbe.2024232 |
[6] | Xiaowei Zhang, Jiayu Tan, Xinyu Zhang, Kritika Pandey, Yuqing Zhong, Guitao Wu, Kejun He . Aggrephagy-related gene signature correlates with survival and tumor-associated macrophages in glioma: Insights from single-cell and bulk RNA sequencing. Mathematical Biosciences and Engineering, 2024, 21(2): 2407-2431. doi: 10.3934/mbe.2024106 |
[7] | Hongwei Sun, Qian Gao, Guiming Zhu, Chunlei Han, Haosen Yan, Tong Wang . Identification of influential observations in high-dimensional survival data through robust penalized Cox regression based on trimming. Mathematical Biosciences and Engineering, 2023, 20(3): 5352-5378. doi: 10.3934/mbe.2023248 |
[8] | Moxuan Zhang, Quan Zhang, Jilin Bai, Zhiming Zhao, Jian Zhang . Transcriptome analysis revealed CENPF associated with glioma prognosis. Mathematical Biosciences and Engineering, 2021, 18(3): 2077-2096. doi: 10.3934/mbe.2021107 |
[9] | Yuan Yang, Lingshan Zhou, Xi Gou, Guozhi Wu, Ya Zheng, Min Liu, Zhaofeng Chen, Yuping Wang, Rui Ji, Qinghong Guo, Yongning Zhou . Comprehensive analysis to identify DNA damage response-related lncRNA pairs as a prognostic and therapeutic biomarker in gastric cancer. Mathematical Biosciences and Engineering, 2022, 19(1): 595-611. doi: 10.3934/mbe.2022026 |
[10] | Wei Niu, Lianping Jiang . A seven-gene prognostic model related to immune checkpoint PD-1 revealing overall survival in patients with lung adenocarcinoma. Mathematical Biosciences and Engineering, 2021, 18(5): 6136-6154. doi: 10.3934/mbe.2021307 |
Low-grade glioma (LGG) is a uniformly fatal tumor, and the survival from this tumor is approximately 7 years [1]. Because of the heterogeneity in LGG patients, different LGG subtypes increase the difficulty of optimizing management of adult low-grade gliomas [2,3]. Magnetic Resonance Imaging (MRI) is an imaging technique that can capture tumors of the brain clearly [4]. Clinicians often use MRI images to diagnose the agammaessiveness of the tumor. Therefore, the analysis of MRI data and feature extraction are becoming more challenging. To address these issues, many studies have used MRI data to extract prognostic factors for LGG patients. In a study by Pignatti et al. [5], the authors established a score system that can be used to determine the prognostic score. In adult patients with LGG, the age of the patients, the astrocytoma histology, the largest diameter of the tumor, the tumor crossing the midline and the presence of a neurologic deficit before surgery are all important prognostic factors for survival. These factors can be used to identify low-risk and high-risk patients. In a study by Chen et al. [6], the authors developed a computer-assisted algorithm for tumor segmentation and characterization using both kinetic information and morphological features of 3-D DCE-MRI. They differentiated benign and malignant lesions by analyzing 3-D morphological features including shape features and texture features of the segmented tumor. In a study by Agravat et al. [7], the authors implemented the DeepMedic CNN architecture for tumor segmentation and the extracted features are fed to a random forest classifier to obtain 59% overall survival accuracy. In another study by Shboul et al. [8], 40 features were extracted from the predicted brain tumor mask and fed to a random forest regression to predict the overall survival of a glioma patient, with an accuracy of 67% on the training dataset and 57.9% on the testing dataset. In an attempt at prediction of survival [9], the authors extracted 26 image-derived geometrical features and used SVM to predict the risk of death and classify glioma patients into three groups, with an accuracy of 56.8%. In another attempt [10], hundreds of intensity and texture features were extracted from MR images of glioblastoma multiforme, and principal component analysis (PCA) was used to reduce dimensionality. Then, these features were fed to an artificial neural network (ANN). A result with accuracy of 65.1% was obtained based on two classes: short-overall survivor and long-overall survivor. In another study [11], Chato et al. attempted the use of support vector machines (SVMs), k-nearest neighbors (KNNs), linear discriminants, tree, ensembles and logistic regression to classify survivors into two or three classes. The features from segmentations are used to train the linear discriminant for prediction of survival. The texture features resulted in the accuracy of 46%, and histogram features achieved an accuracy of 68.5% for the test dataset.
The above methods predicted survival by using only image information or clinical information. However, the tumor heterogeneity possibly comes from strong phenotypic differences, and it is difficult to predict prognosis accurately by using only medical imaging analysis (see Figure 1), thus motivating the need for integrating another kind of data. Along with the rapid development of deep-sequencing technology, the output of sequencing has made huge progress not only in equality but also in speed [12]. If radiomic data and genomic data can be integrated, this integration will build a bridge between micro and macro and increase the accuracy of the precision diagnosis and treatment of the brain tumor [13]. Grossmann et al. [14] found that prognostic biomarkers performed better in lung cancer when radiomic, genetic, and clinical information was combined. The C-index was 0.73, while the result is only 0.66 when lacking genetic information. Xia et al. [15] created a radiogenomic strategy that can obtain significant associations between imaging features and gene expression patterns in hepatocellular carcinoma. However, similar work is lacking in LGG. Therefore, in this study we integrated two different types of data, i.e., radiomic features of MRI and gene signatures, to develop a new integrated survival prediction measure for LGG.
The framework of this study is shown in Figure 2. First, we used gene expression data to construct a gene regulatory network and identify network modules and then used imaging data to extract significant radiomic biomarkers that are associated with the survival of the patient (Parts (a) and (b)), respectively. Then, we calculated the correlation between gene modules and image features to obtain a small number of gene signatures that are connected with these image features (Part (c)). Furthermore, we established a Lasso (least absolute shrinkage and selection operator) model to predict the image features with only gene expression values (Parts (d) and (e)). Based on gene expression data, we used support vector machines (SVMs) to identify the gene signatures (Parts (f) and (g)). We combined the predicted image features and the gene signatures to establish an integrated measure that can predict survival of the LGG patient (Parts (h) and (i)). The results show that the integrated measure performed better on survival prediction than any other single index.
Computer-aided and manually corrected segmentation labels for the preoperative multi-institutional scans of 65 LGG patients and 724 radiomic features along with the corresponding skull-stripped and coregistered multimodal (i.e., T1, T1-Gd, T2, T2-FLAIR) MRI data were collected from the Cancer Imaging Archive (TCIA) [16,17,18]. The corresponding RNA-seq data and Disease Free Survival (DSS) data for these 65 patients were also obtained from The Cancer Genome Atlas (TCGA) database. These data were used in this study as the training dataset.
The gene expression data and the corresponding DSS data of 455 LGG patients were downloaded from TCGA and used in this study as the validation dataset.
A gene coexpression network was constructed using gene expression data in the training dataset. We deleted genes that express in less than 20% of the patients or have no expression values. Then, we retained genes that have the highest 25% variance. A pairwise correlation matrix was calculated, and then we adjusted the matrix by raising it to the power of five using the R package WGCNA [19,20]. The minimum module size was set to 50, and the minimum height for merging modules was set to 0.25.
We identified significant image features that are associated with patient DSS by training a multivariate Cox regression model [21] on the training dataset. Image features were filtered with the standard that p value must be less than 0.01. Then, these image features were treated as image biomarkers and survival prediction indexes. For each image feature, we divided patients on the validation dataset into two groups—high-risk group and low-risk group—by taking the median value of the feature as the threshold and plotted the Kaplan-Meier curves. The concordance index (C-index) [22] and the log-rank test were also used to assess the prognostic prediction performance.
The basic formula of the multivariate Cox regression model is described as follows:
h(t,X)=h0(t)⋅exp(β1X1+β2X2+...+βmXm) | (1) |
h(t,X) represents the hazard function and h0(t) is the baseline hazard function. The factor X1, X2, ..., Xm correspond to the image features here and β1, β2, ..., βm are the corresponding regression coefficients.
We calculated Pearson correlation coefficients and their statistical significance to obtain the correlations between gene modules and selected image features. Because there are many genes in each module, the principal component analysis (PCA) was used to reduce the dimension of gene expression data of 65 patients in the training dataset. Then, image features were filtered. Features that showed significant correlation (p value less than 0.05) with at least one gene module were retained, and others were removed. Then, gene modules associated with the same image feature were integrated. The enrichment analysis was performed to identify the significantly enriched molecular pathways on these modules.
We established a radiogenomic map by identifying gene signatures associated with the prognostic imaging features. Lasso (least absolute shrinkage and selection operator) is a regression analysis method that performs both variable selection and regularization [23,24]. This method can enhance the prediction accuracy and interpretability of the statistical model it produces.
Q(β)=‖y−Xβ‖2+λ‖β‖1 | (2) |
Among the above formulas, X is the variable and y is the label. β is the coefficient that we want to optimize. Q(β) is the objective function that we want to minimize. Compared with the method of least squares, the objective function in the Lasso model has a regularization term λ‖β‖1. With this L1 norm regularization term, Lasso can control the number of variables used and improve the generalization ability of the model. For each image feature remaining in the gene module analysis, Lasso was trained to select gene signatures from related gene modules and make a prediction on image features with MRI data and gene expression data in the training dataset. We determined the regularization coefficient λ by minimizing the MSE (mean squared error) of the model.
In this step, we obtained a survival prediction index using only gene signatures, without the information of image features. SVMs (support vector machines) are supervised learning models that can be used for classification and regression problems [25,26,27]. For a classification problem, the optimal hyperplane is searched to separate data into two classes with the max margin. For new data, the trained hyperplane is used to predict the label or the probability of each class. Sometimes, data may not be separated completely, and a soft margin [25] can be used by adding a penalty parameter C and slack variables ξi to obtain the minimum error. The SVM optimization problem is
minω,C12‖ω‖2+CN∑i=1ξi | (3) |
subject to
yif(xi)≥1−ξi, and ξi≥0 | (4) |
The vector ω is the vector orthogonal to the hyperplane. xi, yi are an observation pair of data points, and f(xi) is the label of xi predicted by the SVM. SVM-RFE (support vector machine-recursive feature elimination) [28] is a powerful feature selection algorithm based on SVM that can avoid overfitting when the number of features is high. In each iteration, features are scored and sorted through model training and the least important feature is removed. Remaining features are used for a next training, and the above step is repeated. The score for sorting of the ith feature is defined as
ci=ω2i | (5) |
ωi is the ith dimension of the hyperplane orthogonal vector ω in SVM. Finally, the optimal number of features that have the minimum error is determined.
We use SVM-FRE to select gene signatures and train a classification SVM model with expression data of these selected gene signatures and DSS data in the training dataset. The patient labels are set to 0 or 1 based on their prognostic situation—survival or death. Then, the predicted probability is treated as a survival prediction index. Survival curve and C-index are used to access the prediction performance.
Further, we consider a combination of selected image biomarkers and the index calculated by SVM with gene signatures. To ensure improvement of the new agammaegated index, we transform the calculation of optimal combination coefficients of all features into an optimization problem. Specifically, suppose that N image features are considered to be associated with DSS independently—which are recorded as f1, f2, ..., fN and the gene index value from SVM is recorded as g. The integrated measure we want to determine is recorded as f. The optimization problem needing to be solved can be described as follows.
maxfCf | (6) |
subject to
f=N∑i=1αi⋅fi+β⋅g | (7) |
N∑i=1αi+β=1 | (8) |
where Cf is the C-index of integrated measure f on the training dataset. Our goal is to search optimal parameters α1, α2, ..., αN and β in Eq (7) to maximize the Cf in (6).
The Particle Swarm Optimization (PSO) algorithm [29] is used to solve the optimization problem (6) in this study. PSO is an evolutionary computation algorithm inspired by bird activities that can solve any optimization problem. Initial population with some random particle is created first. For each particle, the position represents a solution, and the corresponding fitness means a value of target function. The object of PSO is to find the optimal particle that has the minimized fitness by updating the velocity and position of particle as the following formula:
vi=ω⋅vi+c1⋅r1⋅(pbesti−xi)+c2⋅r2 | (9) |
xi=xi+vi | (10) |
xi, vi is the position and velocity of the ith particle. pbesti is the best position of the ith particle in history and gbest is the best position of all particles currently. r1, r2 are random numbers between 0 and 1. ω is the inertia weight, and c1, c2 are the acceleration constants.
We take a log-rank test on 724 image features using DSS data of 65 patients in TCIA and filter these features with a standard that the p value is less than 0.01. Then, 21 features remain. Features with high similarity to each other are removed: we calculate the Pearson correlation coefficient between features and remove the one that has the bigger log-rank p value if the Pearson correlation coefficient between two image features is greater than 0.8. After this step, 6 features are removed, and 15 features remain. Based on the above univariable analysis, we first implement the proportional hazard test [21]. Each image feature meets the proportional hazard assumption (detailed information is shown in Additional file 1: Table S1). Then, we train a multivariate Cox regression model on these remaining image features with gene expression data and DSS data in the training dataset. The result is shown in Table 1, and eight features marked with ∗ are considered to be independently correlated with DSS (p<0.05).
Image features | exp(coef) | exp(coef) lower 95% | exp(coef) upper 95% | Wald test | p value |
TEXTURE_GLSZM_ET_T1Gd_SZLGE* | 0 | 0 | 0 | −3.12 | 0.00178 |
HISTO_ED_T2_Bin8* | 0.7 | 0.55 | 0.88 | −3.02 | 0.00254 |
TEXTURE_GLOBAL_ET_T1Gd_Skewness* | 3.03E+05 | 47.29 | 1.94E+09 | 2.82 | 0.00477 |
TEXTURE_GLRLM_NET_FLAIR_LRHGE* | 1 | 0.99 | 1 | −2.66 | 0.0078 |
HISTO_NET_T1_Bin4* | 0.89 | 0.8 | 0.98 | −2.42 | 0.01559 |
HISTO_ET_T1Gd_Bin10* | 1.19 | 1.03 | 1.38 | 2.35 | 0.01877 |
TEXTURE_GLSZM_NET_T1Gd_ZSV* | 0 | 0 | 0 | −2.34 | 0.01906 |
TEXTURE_GLRLM_NET_T1Gd_GLV* | 2.00E+42 | 2230.45 | 1.79E+81 | 2.13 | 0.0333 |
HISTO_ET_T1_Bin10 | 0.81 | 0.63 | 1.05 | −1.59 | 0.11219 |
TEXTURE_GLCM_ET_T2_SumAverage | 0 | 0 | 9.95E+83 | −1.57 | 0.11569 |
TEXTURE_GLRLM_NET_T1_LGRE | 0 | 0 | 2.73E+38 | −1.49 | 0.13526 |
TEXTURE_GLRLM_ED_T1_RLV | inf | 0 | inf | 1.4 | 0.16185 |
HISTO_ED_T2_Bin4 | 0.96 | 0.88 | 1.05 | −0.91 | 0.36241 |
TEXTURE_GLCM_ED_FLAIR_Energy | inf | 0 | inf | 0.78 | 0.43387 |
TEXTURE_GLSZM_NET_T1_LZLGE | 0.99 | 0.96 | 1.03 | −0.39 | 0.69976 |
A gene coexpression network is constructed using gene expression data of 65 patients in the training dataset. We delete genes that express in less than 20% of the patients or have no expression values (n = 1875). Then, we retain genes that have the highest 25% variance (n = 4663). A pairwise correlation matrix is calculated, and then we adjust the matrix by raising it to the power of five using the R package WGCNA [19,20]. The minimum module size is set to 50 and the minimum height for merging modules is set to 0.25. Then, we get 12 gene modules. Detailed information on the modules is shown in Additional file 2: Table S2.
The Pearson correlation coefficient and their statistical significance were calculated between the 12 gene modules and the 8 image features. The result is shown in Figure 3. Four image features that show significant correlation (p<0.05) with at least one gene module were obtained. HISTO_ED_T2_Bin8 is the 8-bin histogram feature of the peritumoral edema in T2-weighted precontrast, TEXTURE_GLSZM_NET_T1Gd_ZSV is the zone size variance of gray level size zone matrix (GLSZM) of the nonenhancing part of the tumor core in T1-weighted postcontrast, TEXTURE_GLRLM_NET_FLAIR_LRHGE is the long run high gray level emphasis of gray level run length matrix (GLRLM) of the nonenhancing part of the tumor core in T2 Fluid-Attenuated Inversion Recovery, and TEXTURE_GLRLM_NET_T1Gd_GLV is the gray level variance of GLRLM of the nonenhancing part of the tumor core in T1-weighted postcontrast. Then, their corresponding gene modules were integrated. The statistical results are shown in Table 2 and the detailed list of genes is shown in Additional file 3: Table S3.
Image features | Associated gene modules | Number of associated genes |
HISTO_ED_T2_Bin8 | module2, module4, module5, module7 | 2794 |
TEXTURE_GLSZM_NET_T1Gd_ZSV | module6 | 506 |
TEXTURE_GLRLM_NET_FLAIR_LRHGE | module6 | 506 |
TEXTURE_GLRLM_NET_T1Gd_GLV | module2, module4, module6, module8, module10 | 1421 |
A further KEGG enrichment analysis was performed on integrated gene modules using the Metascape website [30], which is shown in Figure 4. The complete list of biological annotations is shown in Additional file 4: Table S4. Among these, the neuroactive ligand-receptor interaction pathway is mostly enriched in all integrated gene modules with the minimum p value of 1.259×10−41, which is reported to be associated with glioma [31,32].
Then, the Lasso method described in section 2.5 was used to select gene signatures from the related gene modules and establish a map from genes to image features. We determined the regularization coefficient λ by minimizing the MSE (mean squared error) of the model. The process is shown in Figure 6. The optimal coefficient λ and the corresponding RMSE (root mean squared error) of 65 patients are shown in Table 3. The number of selected gene signatures is also shown. The detailed list of gene signatures is shown in Additional file 5: Table S5.
Image feature | Number of genes in associated modules | Optimal λ | RMSE | Number of genes selected by Lasso |
HISTO_ED_T2_Bin8 | 2794 | 1.6627 | 6.0847 | 12 |
TEXTURE_GLSZM_NET_T1Gd_ZSV | 506 | 7.81E-06 | 2.0195E-5 | 3 |
TEXTURE_GLRLM_NET_FLAIR_LRHGE | 506 | 163.9677 | 528.16 | 6 |
TEXTURE_GLRLM_NET_T1Gd_GLV | 1421 | 3.01E-03 | 0.0120 | 18 |
We made a prediction on the 4 image features using Lasso with gene expression data of 455 patients in TCGA as the validation dataset. We then took the value of each image feature as a survival prediction index. We calculated the C-index and plotted the Kaplan-Meier curves on the validation dataset. The result is shown in Figure 7. The C-index of these four survival prediction indexes are 0.6945, 0.7321, 0.7926, and 0.7985. These results indicate that these four image features perform well in survival prediction.
From the selected 4663 genes with high variance, we fed gene expression data and DSS data of 65 patients in TCIA to SVM-FRE and obtained 43 gene signatures (shown in Additional file 6: Table S6). Then, we trained a classification SVM model with these selected genes. The variables were gene expression data of 65 patients, and the labels were set to 0 or 1 based on the patient prognostic situation—survival or death. Penalty parameter C was set to 2 and 5-fold cross-validation was used to evaluate the error in the recursive feature elimination process. We trained the SVM model and took the predicted probability of survival as a survival prediction index. C-index and survival curve are shown in Figure 8. The C-index is 0.7627.
We took a linear combination of four significant image features and the index calculated by SVM with gene signatures. A better integrated measure was obtained that represents patient survival situation. Set N=4 in formula (7). Four normalized image feature values were recorded as f1, f2, f3, and f4, and the index value from SVM was recorded as g. The integrated measure is recorded as f. Then, we get
f=4∑i=1αi⋅fi+β⋅g | (11) |
We used PSO algorithm to calculate the optimal coefficient to maximize the C-index of 65 patients in the training dataset, with parameters ω, C1 and C2 of 0.8, 0.5 and 0.5. The initial population size was set to 20, 25, 30, 35 and 40, and the corresponding iteration number was set to 30 to ensure the convergence of PSO. We repeated numerical experiments 10 times and recorded the average result for different parameters. Detailed results of each experiment are shown in Additional file 7: Table S7. For each population size, we then brought the coefficients into formula (11) and obtained integrated measure f with different forms. C-index was calculated using gene expression data on the validation dataset. The validation result is shown in Table 4.
Populations sizes | 20 | 25 | 30 | 35 | 40 |
α1 | 0.2926 | 0.3187 | 0.2792 | 0.3303 | 0.276 |
α2 | 0.0663 | 0.0394 | 0.0739 | 0.0505 | 0.068 |
α3 | 0.2171 | 0.2329 | 0.2076 | 0.2214 | 0.2102 |
α4 | 0.0091 | 0.019 | 0.0107 | 0.0298 | 0.0159 |
β | 0.4149 | 0.39 | 0.4288 | 0.368 | 0.4298 |
C-index | 0.8065 | 0.807 | 0.8061 | 0.807 | 0.8057 |
From Table 4, we observe that β is more or less than 0.4 with different parameters. Therefore, the proportion of gene signatures in integration is approximately 40%. α1 is approximately 0.3, α2 is approximately 0.06 and α3 is approximately 0.24. α4 is nearly 0, indicating that the gray level variance of GLRLM of the nonenhancing part of the tumor core in T1-weighted postcontrast can be removed in the integration. We then set parameters α1, α2, α3, α4 and β to 0.3, 0.06, 0.24, 0 and 0.4. We brought these coefficients into formula (11) and calculated the integrated measure f on the validation dataset. The Kaplan-Meier curve is shown in Figure 9. The C-index of the four independent image features, gene signatures and integrated measures are shown in Table 5.
Image features | f1 | f2 | f3 | f4 | g | f |
C-index | 0.6945 | 0.7321 | 0.7926 | 0.7985 | 0.7627 | 0.8071 |
The C-index of the integrated measure f is 0.8071 and is higher than any other measure based on image signatures or gene signature. This result indicates that the integrated measure can improve the prediction accuracy. The integrated measure is recorded as follows.
f=0.3f1+0.06f2+0.24f3+0.4g | (12) |
Furthermore, we use the time dependent Receiver Operating Characteristic (ROC) [33] to further assess the predictive power and compare different prediction models. Time-dependent ROC analysis showed that the integrated measure improved our ability to predict prognosis [AUC, 0.79; and 95% confidence intervals (CI), 0.71 to 0.87] (see Figure 10), when compared with other measures based on image signatures or gene signatures.
Patients are defined into two groups—high-risk group and low-risk group, based on their prognosis—DSS value in this study, by taking the median value of DSS of 65 patients in the training dataset as a threshold. Then, classification is conducted on 455 patients in the validation dataset by taking a threshold of the median value of the integrated measure in the training dataset. The accuracy is 72.1%, which is higher than the accuracy of the published studies [7,8,9,10,11].
The primary goal of phenotyping and classifying a human tumor is to capture tumor heterogeneity and realize personalized precision diagnosis and therapy. In clinical practice, the massive and multiple types of big medical data are available with the rapid development of biomedical engineering and computer application technology. However, one of the biggest challenges in clinical applications is how to integrate these different types of data to extract accuracy information.
In this study, we attempted to integrate both MRI data and gene expression data to propose a new feature measure that could be used to identify subsets of LGG patients at low and high risk for progression to DSS. Based on gene expression data, we first used the WGCNA method to construct the network and identify twelve network modules. With MRI data, eight image biomarkers were obtained by using the Cox regression model. Furthermore, through correlation analysis between gene modules and image features, four radiomic biomarkers were identified. Because MRI data are not available in our test dataset, the Lasso method was applied to build a map from gene expression data to these image features. In addition, we also independently used gene expression data to predict image biomarkers through the SVM method. Finally, an integrated measure (IM) for combining image and gene signatures was obtained through the PSO algorithm. We validated IM with gene expression data and DSS data on 455 patients in the validation dataset. The C-index of IM is 0.8071 and its Area Under Curve (AUC) of the ROC curve is 0.79, higher than any other single measure. The accuracy of classification of patients is 72.1%, which is higher than the accuracy of the published work using only radiomic data [7,8,9,10,11]. The results demonstrate that the proposed IM enhances the prediction accuracy for lower grade gliomas.
In summary, the accuracy of DSS prediction of LGG patients is successfully improved by integrating radiomic features in Macro with the gene expression data in Micro. The proposed method in this study can also be extended to analyze different data sources of other tumors.
This work was supported by the National Key Research and Development Program of China (No. 2018YFC1314600), the Key Program of the National Natural Science Foundation of China (No. 11831015) and the Chinese National Natural Science Foundation (No. 61672388).
All authors declare no conflicts of interest in this paper.
[1] | D. F. Anderson, G. Craciun and T. G. Kurtz, Product-form stationary distributions for deficiency zero chemical reaction networks, B. Math. Biol., 72 (2010), 1947–1970. |
[2] | D. F. Anderson, G. Craciun, M. Gopalkrishnan, et al., Lyapunov functions, stationary distributions, and non-equilibrium potential for reaction networks, B. Math. Biol., 77 (2015), 1744–1767. |
[3] | D. Cappelletti and C.Wiuf, Product-form poisson-like distributions and complex balanced reaction systems, SIAM . Appl. Math., 76 (2016), 411–432. |
[4] | D. F. Anderson and S. L. Cotter, Product form stationary distributions for deficiency zero networks with non-mass action kinetics, B. Math. Biol., 78(2016), 2390–2407. |
[5] | D. F. Anderson, D. Cappelletti, M. Koyama, et al., Non-explosivity of stochastically modeled reaction networks that are complex balanced, B. Math. Biol., 80 (2018), 2561–2579. |
[6] | A. Agazzi, A. Dembo and J. P. Eckmann, Large deviations theory for Markov jump models of chemical reaction networks, Ann. Appl. Probab., 28 (2018), 1821–1855. |
[7] | H. Ge and H. Qian, Mathematical formalism of nonequilibrium thermodynamics for nonlinear chemical reaction systems with general rate law, J. Stat. Phys., 166 (2017), 190–209. |
[8] | T. G. Kurtz, Representations of markov processes as multiparameter time changes, Ann. Prob., 8 (1980), 682–715. |
[9] | C. Chan, X. Liu, L. Wang, et al., Protein scaffolds can enhance the bistability of multisite phosphorylation systems, PLoS Comput. Biol., 8 (2012), 1–9. |
[10] | G. Gnacadja, Univalent positive polynomial maps and the equilibrium state of chemical networks of reversible binding reactions, Adv. Appl. Math., 43 (2009), 394–414. |
[11] | H. W. Kang, L. Zheng and H. G. Othmer, A new method for choosing the computational cell in stochastic reaction–diffusion systems, J. Mathe. Biol., 65 (2012), 1017–1099. |
[12] | E. D. Sontag, Structure and stability of certain chemical networks and applications to the kinetic proofreading of t-cell receptor signal transduction, IEEE Trans. Auto. Cont., 46 (2001), 1028– 1047. |
[13] | F. J. M. Horn and R. Jackson, General mass action kinetics, Arch. Rat. Mech. Anal, 47 (1972), 81–116. |
[14] | M. Feinberg, Chemical reaction network structure and the stability of complex isothermal reactors - I. the deficiency zero and deficiency one theorems, review article 25, Chem. Eng. Sci., 42 (1987), 2229–2268. |
[15] | M. Feinberg, Lectures on chemical reaction networks, Delivered at the Mathematics Research Center, Univ. Wisc.-Madison, (1979). Available from http://www.che.eng.ohio-state. edu/~feinberg/LecturesOnReactionNetworks. |
[16] | J. Gunawardena, Chemical reaction network theory for in-silico biologists. Available from http: //vcp.med.harvard.edu/papers/crnt.pdf, (2003). |
[17] | F. P. Kelly, Reversibility and stochastic networks, J. Wiley, 1979. |
[18] | P. Whittle, Systems in stochastic equilibrium, J. Wiley, 1986. |
[19] | H. G. Othmer, Y. Kim and M. A. Stolarska, The role of the microenvironment in tumor growth and invasion, Prog. Biophys. Mol. Bio., 106 (2011), 353–379. |
[20] | D. F. Anderson and T. G. Kurtz, Continuous time Markov chain models for chemical reaction networks, in Design and Analysis of Biomolecular Circuits: Engineering Approaches to Systems and Synthetic Biology, Springer, (2011), 3–42. |
[21] | D. F. Anderson and T. G. Kurtz, Stochastic analysis of biochemical systems, Springer, 2015. |
[22] | T. G. Kurtz, Strong approximation theorems for density dependent Markov chains, Stoch. Proc. Appl., 6 (1977/78), 223–240. |
[23] | R. B. Paris and A. D. Wood, Asymptotics of high order differential equations, Pitman Research Notes in Mathematics Series, 1986. |
[24] | G. Craciun, Toric differential inclusions and a proof of the global attractor conjecture, preprint, arXiv:1501.02860. |
[25] | F. J. M. Horn, Necessary and sufficient conditions for complex balancing in chemical kinetics, Arch. Rat. Mech. Anal, 49 (1972), 172–186. |
[26] | V. Kazeev, M. Khammash, M. Nip, et al., Direct solution of the chemical master equation using quantized tensor trains, PLoS Comput. Biol., 10 (2014) ,e1003359. Available from: https:// doi.org/10.1371/journal.pcbi.1003359. |
1. | Farinaz Forouzannia, Vahid Shahrezaei, Mohammad Kohandel, The impact of random microenvironmental fluctuations on tumor control probability, 2021, 509, 00225193, 110494, 10.1016/j.jtbi.2020.110494 | |
2. | Anuraag Bukkuri, Kenneth J. Pienta, Robert H. Austin, Emma U. Hammarlund, Sarah R. Amend, Joel S. Brown, A mathematical investigation of polyaneuploid cancer cell memory and cross-resistance in state-structured cancer populations, 2023, 13, 2045-2322, 10.1038/s41598-023-42368-8 | |
3. | Anuraag Bukkuri, Modeling stress-induced responses: plasticity in continuous state space and gradual clonal evolution, 2024, 143, 1431-7613, 63, 10.1007/s12064-023-00410-3 |
Image features | exp(coef) | exp(coef) lower 95% | exp(coef) upper 95% | Wald test | p value |
TEXTURE_GLSZM_ET_T1Gd_SZLGE* | 0 | 0 | 0 | −3.12 | 0.00178 |
HISTO_ED_T2_Bin8* | 0.7 | 0.55 | 0.88 | −3.02 | 0.00254 |
TEXTURE_GLOBAL_ET_T1Gd_Skewness* | 3.03E+05 | 47.29 | 1.94E+09 | 2.82 | 0.00477 |
TEXTURE_GLRLM_NET_FLAIR_LRHGE* | 1 | 0.99 | 1 | −2.66 | 0.0078 |
HISTO_NET_T1_Bin4* | 0.89 | 0.8 | 0.98 | −2.42 | 0.01559 |
HISTO_ET_T1Gd_Bin10* | 1.19 | 1.03 | 1.38 | 2.35 | 0.01877 |
TEXTURE_GLSZM_NET_T1Gd_ZSV* | 0 | 0 | 0 | −2.34 | 0.01906 |
TEXTURE_GLRLM_NET_T1Gd_GLV* | 2.00E+42 | 2230.45 | 1.79E+81 | 2.13 | 0.0333 |
HISTO_ET_T1_Bin10 | 0.81 | 0.63 | 1.05 | −1.59 | 0.11219 |
TEXTURE_GLCM_ET_T2_SumAverage | 0 | 0 | 9.95E+83 | −1.57 | 0.11569 |
TEXTURE_GLRLM_NET_T1_LGRE | 0 | 0 | 2.73E+38 | −1.49 | 0.13526 |
TEXTURE_GLRLM_ED_T1_RLV | inf | 0 | inf | 1.4 | 0.16185 |
HISTO_ED_T2_Bin4 | 0.96 | 0.88 | 1.05 | −0.91 | 0.36241 |
TEXTURE_GLCM_ED_FLAIR_Energy | inf | 0 | inf | 0.78 | 0.43387 |
TEXTURE_GLSZM_NET_T1_LZLGE | 0.99 | 0.96 | 1.03 | −0.39 | 0.69976 |
Image features | Associated gene modules | Number of associated genes |
HISTO_ED_T2_Bin8 | module2, module4, module5, module7 | 2794 |
TEXTURE_GLSZM_NET_T1Gd_ZSV | module6 | 506 |
TEXTURE_GLRLM_NET_FLAIR_LRHGE | module6 | 506 |
TEXTURE_GLRLM_NET_T1Gd_GLV | module2, module4, module6, module8, module10 | 1421 |
Image feature | Number of genes in associated modules | Optimal λ | RMSE | Number of genes selected by Lasso |
HISTO_ED_T2_Bin8 | 2794 | 1.6627 | 6.0847 | 12 |
TEXTURE_GLSZM_NET_T1Gd_ZSV | 506 | 7.81E-06 | 2.0195E-5 | 3 |
TEXTURE_GLRLM_NET_FLAIR_LRHGE | 506 | 163.9677 | 528.16 | 6 |
TEXTURE_GLRLM_NET_T1Gd_GLV | 1421 | 3.01E-03 | 0.0120 | 18 |
Populations sizes | 20 | 25 | 30 | 35 | 40 |
α1 | 0.2926 | 0.3187 | 0.2792 | 0.3303 | 0.276 |
α2 | 0.0663 | 0.0394 | 0.0739 | 0.0505 | 0.068 |
α3 | 0.2171 | 0.2329 | 0.2076 | 0.2214 | 0.2102 |
α4 | 0.0091 | 0.019 | 0.0107 | 0.0298 | 0.0159 |
β | 0.4149 | 0.39 | 0.4288 | 0.368 | 0.4298 |
C-index | 0.8065 | 0.807 | 0.8061 | 0.807 | 0.8057 |
Image features | f1 | f2 | f3 | f4 | g | f |
C-index | 0.6945 | 0.7321 | 0.7926 | 0.7985 | 0.7627 | 0.8071 |
Image features | exp(coef) | exp(coef) lower 95% | exp(coef) upper 95% | Wald test | p value |
TEXTURE_GLSZM_ET_T1Gd_SZLGE* | 0 | 0 | 0 | −3.12 | 0.00178 |
HISTO_ED_T2_Bin8* | 0.7 | 0.55 | 0.88 | −3.02 | 0.00254 |
TEXTURE_GLOBAL_ET_T1Gd_Skewness* | 3.03E+05 | 47.29 | 1.94E+09 | 2.82 | 0.00477 |
TEXTURE_GLRLM_NET_FLAIR_LRHGE* | 1 | 0.99 | 1 | −2.66 | 0.0078 |
HISTO_NET_T1_Bin4* | 0.89 | 0.8 | 0.98 | −2.42 | 0.01559 |
HISTO_ET_T1Gd_Bin10* | 1.19 | 1.03 | 1.38 | 2.35 | 0.01877 |
TEXTURE_GLSZM_NET_T1Gd_ZSV* | 0 | 0 | 0 | −2.34 | 0.01906 |
TEXTURE_GLRLM_NET_T1Gd_GLV* | 2.00E+42 | 2230.45 | 1.79E+81 | 2.13 | 0.0333 |
HISTO_ET_T1_Bin10 | 0.81 | 0.63 | 1.05 | −1.59 | 0.11219 |
TEXTURE_GLCM_ET_T2_SumAverage | 0 | 0 | 9.95E+83 | −1.57 | 0.11569 |
TEXTURE_GLRLM_NET_T1_LGRE | 0 | 0 | 2.73E+38 | −1.49 | 0.13526 |
TEXTURE_GLRLM_ED_T1_RLV | inf | 0 | inf | 1.4 | 0.16185 |
HISTO_ED_T2_Bin4 | 0.96 | 0.88 | 1.05 | −0.91 | 0.36241 |
TEXTURE_GLCM_ED_FLAIR_Energy | inf | 0 | inf | 0.78 | 0.43387 |
TEXTURE_GLSZM_NET_T1_LZLGE | 0.99 | 0.96 | 1.03 | −0.39 | 0.69976 |
Image features | Associated gene modules | Number of associated genes |
HISTO_ED_T2_Bin8 | module2, module4, module5, module7 | 2794 |
TEXTURE_GLSZM_NET_T1Gd_ZSV | module6 | 506 |
TEXTURE_GLRLM_NET_FLAIR_LRHGE | module6 | 506 |
TEXTURE_GLRLM_NET_T1Gd_GLV | module2, module4, module6, module8, module10 | 1421 |
Image feature | Number of genes in associated modules | Optimal λ | RMSE | Number of genes selected by Lasso |
HISTO_ED_T2_Bin8 | 2794 | 1.6627 | 6.0847 | 12 |
TEXTURE_GLSZM_NET_T1Gd_ZSV | 506 | 7.81E-06 | 2.0195E-5 | 3 |
TEXTURE_GLRLM_NET_FLAIR_LRHGE | 506 | 163.9677 | 528.16 | 6 |
TEXTURE_GLRLM_NET_T1Gd_GLV | 1421 | 3.01E-03 | 0.0120 | 18 |
Populations sizes | 20 | 25 | 30 | 35 | 40 |
α1 | 0.2926 | 0.3187 | 0.2792 | 0.3303 | 0.276 |
α2 | 0.0663 | 0.0394 | 0.0739 | 0.0505 | 0.068 |
α3 | 0.2171 | 0.2329 | 0.2076 | 0.2214 | 0.2102 |
α4 | 0.0091 | 0.019 | 0.0107 | 0.0298 | 0.0159 |
β | 0.4149 | 0.39 | 0.4288 | 0.368 | 0.4298 |
C-index | 0.8065 | 0.807 | 0.8061 | 0.807 | 0.8057 |
Image features | f1 | f2 | f3 | f4 | g | f |
C-index | 0.6945 | 0.7321 | 0.7926 | 0.7985 | 0.7627 | 0.8071 |