Omics type | Number of samples | Number of features | Summary |
mRNA | 367 | 46,610 | HTSeq-FPKM |
DNA methylation | 363 | 24,923 | Illumina Human Methylation 27k |
miRNA | 499 | 1874 | BCGSC Illumina HiSeq |
CNV | 606 | 24,740 | Affymetrix SNP Array 6.0 |
Ovarian cancer is a tumor with different clinicopathological and molecular features, and the vast majority of patients have local or extensive spread at the time of diagnosis. Early diagnosis and prognostic prediction of patients can contribute to the understanding of the underlying pathogenesis of ovarian cancer and the improvement of therapeutic outcomes. The occurrence of ovarian cancer is influenced by multiple complex mechanisms, including the genome, transcriptome and proteome. Different types of omics analysis help predict the survival rate of ovarian cancer patients. Multi-omics data of ovarian cancer exhibit high-dimensional heterogeneity, and existing methods for integrating multi-omics data have not taken into account the variability and inter-correlation between different omics data. In this paper, we propose a deep learning model, MDCADON, which utilizes multi-omics data and cross-modal view correlation discovery network. We introduce random forest into LASSO regression for feature selection on mRNA expression, DNA methylation, miRNA expression and copy number variation (CNV), aiming to select important features highly correlated with ovarian cancer prognosis. A multi-modal deep neural network is used to comprehensively learn feature representations of each omics data and clinical data, and cross-modal view correlation discovery network is employed to construct the multi-omics discovery tensor, exploring the inter-relationships between different omics data. The experimental results demonstrate that MDCADON is superior to the existing methods in predicting ovarian cancer prognosis, which enables survival analysis for patients and facilitates the determination of follow-up treatment plans. Finally, we perform Gene Ontology (GO) term analysis and biological pathway analysis on the genes identified by MDCADON, revealing the underlying mechanisms of ovarian cancer and providing certain support for guiding ovarian cancer treatments.
Citation: Huiqing Wang, Xiao Han, Jianxue Ren, Hao Cheng, Haolin Li, Ying Li, Xue Li. A prognostic prediction model for ovarian cancer using a cross-modal view correlation discovery network[J]. Mathematical Biosciences and Engineering, 2024, 21(1): 736-764. doi: 10.3934/mbe.2024031
[1] | Shi Liu, Kaiyang Li, Yaoying Wang, Tianyou Zhu, Jiwei Li, Zhenyu Chen . Knowledge graph embedding by fusing multimodal content via cross-modal learning. Mathematical Biosciences and Engineering, 2023, 20(8): 14180-14200. doi: 10.3934/mbe.2023634 |
[2] | Dong-feng Li, Aisikeer Tulahong, Md. Nazim Uddin, Huan Zhao, Hua Zhang . Meta-analysis identifying epithelial-derived transcriptomes predicts poor clinical outcome and immune infiltrations in ovarian cancer. Mathematical Biosciences and Engineering, 2021, 18(5): 6527-6551. doi: 10.3934/mbe.2021324 |
[3] | Shuwei Zhu, Wenping Wang, Wei Fang, Meiji Cui . Autoencoder-assisted latent representation learning for survival prediction and multi-view clustering on multi-omics cancer subtyping. Mathematical Biosciences and Engineering, 2023, 20(12): 21098-21119. doi: 10.3934/mbe.2023933 |
[4] | Xiaoyu Hou, Baoshan Ma, Ming Liu, Yuxuan Zhao, Bingjie Chai, Jianqiao Pan, Pengcheng Wang, Di Li, Shuxin Liu, Fengju Song . The transcriptional risk scores for kidney renal clear cell carcinoma using XGBoost and multiple omics data. Mathematical Biosciences and Engineering, 2023, 20(7): 11676-11687. doi: 10.3934/mbe.2023519 |
[5] | Xiaowen Jia, Jingxia Chen, Kexin Liu, Qian Wang, Jialing He . Multimodal depression detection based on an attention graph convolution and transformer. Mathematical Biosciences and Engineering, 2025, 22(3): 652-676. doi: 10.3934/mbe.2025024 |
[6] | Jianqiao Pan, Baoshan Ma, Xiaoyu Hou, Chongyang Li, Tong Xiong, Yi Gong, Fengju Song . The construction of transcriptional risk scores for breast cancer based on lightGBM and multiple omics data. Mathematical Biosciences and Engineering, 2022, 19(12): 12353-12370. doi: 10.3934/mbe.2022576 |
[7] | Xin Lin, Xingyuan Li, Binqiang Ma, Lihua Hang . Identification of novel immunomodulators in lung squamous cell carcinoma based on transcriptomic data. Mathematical Biosciences and Engineering, 2022, 19(2): 1843-1860. doi: 10.3934/mbe.2022086 |
[8] | Zhijing Xu, Yang Gao . Research on cross-modal emotion recognition based on multi-layer semantic fusion. Mathematical Biosciences and Engineering, 2024, 21(2): 2488-2514. doi: 10.3934/mbe.2024110 |
[9] | Hanming Zhai, Xiaojun Lv, Zhiwen Hou, Xin Tong, Fanliang Bu . MLSFF: Multi-level structural features fusion for multi-modal knowledge graph completion. Mathematical Biosciences and Engineering, 2023, 20(8): 14096-14116. doi: 10.3934/mbe.2023630 |
[10] | Jiyun Shen, Yiyi Xia, Yiming Lu, Weizhong Lu, Meiling Qian, Hongjie Wu, Qiming Fu, Jing Chen . Identification of membrane protein types via deep residual hypergraph neural network. Mathematical Biosciences and Engineering, 2023, 20(11): 20188-20212. doi: 10.3934/mbe.2023894 |
Ovarian cancer is a tumor with different clinicopathological and molecular features, and the vast majority of patients have local or extensive spread at the time of diagnosis. Early diagnosis and prognostic prediction of patients can contribute to the understanding of the underlying pathogenesis of ovarian cancer and the improvement of therapeutic outcomes. The occurrence of ovarian cancer is influenced by multiple complex mechanisms, including the genome, transcriptome and proteome. Different types of omics analysis help predict the survival rate of ovarian cancer patients. Multi-omics data of ovarian cancer exhibit high-dimensional heterogeneity, and existing methods for integrating multi-omics data have not taken into account the variability and inter-correlation between different omics data. In this paper, we propose a deep learning model, MDCADON, which utilizes multi-omics data and cross-modal view correlation discovery network. We introduce random forest into LASSO regression for feature selection on mRNA expression, DNA methylation, miRNA expression and copy number variation (CNV), aiming to select important features highly correlated with ovarian cancer prognosis. A multi-modal deep neural network is used to comprehensively learn feature representations of each omics data and clinical data, and cross-modal view correlation discovery network is employed to construct the multi-omics discovery tensor, exploring the inter-relationships between different omics data. The experimental results demonstrate that MDCADON is superior to the existing methods in predicting ovarian cancer prognosis, which enables survival analysis for patients and facilitates the determination of follow-up treatment plans. Finally, we perform Gene Ontology (GO) term analysis and biological pathway analysis on the genes identified by MDCADON, revealing the underlying mechanisms of ovarian cancer and providing certain support for guiding ovarian cancer treatments.
Ovarian cancer is a tumor with a range of different clinicopathological and molecular features [1]. Due to the obscurity of early underlying symptoms and the lack of reliable screening methods, the majority of patients have local or remote spread at the time of diagnosis [2,3]. The occurrence of ovarian cancer is influenced by complex mechanisms at multiple levels, including the genome, transcriptome and proteome [4,5,6]. Different types of omics analysis contribute to predicting treatment outcomes and survival rates for ovarian cancer patients [4,6,7]. Renaud Sabatier et al. [8] pointed out that DNA microarrays have facilitated the exploration of molecular phenotypic correlations in ovarian cancer, revealing the processes of occurrence, diagnosis, prognosis classification and development of new anticancer drugs, which are beneficial for the treatment and prognosis of ovarian cancer patients. Aruni Ghose et al. [9] emphasized that proteomics can identify protein targets and signaling pathways associated with ovarian cancer cell growth and metastasis. By discovering new treatment strategies based on adaptive patient responses, proteomics can reduce the emergence of drug resistance and improve patient outcomes. These studies indicate that omics data of various molecular levels are crucial for predicting the prognosis of ovarian cancer. Utilizing multi-omics data together to explore the potential regulatory network mechanisms within the biological system allows for in-depth investigation of the potential impact of different molecular levels on the prognosis of ovarian cancer [10]. Therefore, the integration of omics data across multiple cellular function levels is conducive to improving the accuracy of ovarian cancer prognosis prediction, which can help to understand the underlying pathogenesis of ovarian cancer and improve the therapeutic outcome [11]. Accurate prognosis prediction plays a significant role in guiding clinical targeted therapies.
With the advancement of high-throughput sequencing technology, numerous studies aim to assess cancer prognosis risk based on multi-omics data [12], encompassing diverse data types such as mRNA expression [13], DNA methylation [14], miRNA expression [15] and copy number variation (CNV) [16]. Due to the high-dimensional nature of cancer multi-omics data and the substantial variations in feature quantities among different omics types [17], feature selection for multi-omics data is essential [18,19,20]. Hu et al. [21] employed random forest to select important features from gastric cancer multi-omics data, enhancing gastric cancer prediction accuracy. Random forest can capture nonlinear relationships among features during feature selection. However, when constructing multiple decision trees, there is a tendency to select similar or overlapping feature subsets, leading to information redundancy and overlooking some crucial features [22]. Mohammed et al. [23] introduced LASSO regression for feature selection on RNASeq gene expression data in the pan-cancer subtype classification study, enhancing the model's classification performance. Xie et al. [24] applied Group LASSO regularization to incorporate gene-level group prior knowledge into the model training process, retaining features associated with cancer prognosis for survival analysis. Although LASSO regression [25] and its variations can extract effective and sparse key features from high-dimensional data, their convex optimization property results in retaining only one when dealing with genes strongly correlated genes. This leads to the loss of other genes relevant to cancer [26]. Therefore, we propose a feature selection algorithm named RLASSO, which combines the advantages of LASSO regression and random forest. RLASSO employs LASSO regression to sparsity features, in conjunction with random forest [27] and performs feature importance ranking on ovarian cancer multi-omics data. This process supplements missing feature information among each other, allowing for a more comprehensive acquisition of features relevant to ovarian cancer prognosis. It helps the discovery of potential biomarkers for ovarian cancer and provides more reliable information for prognostic assessment in ovarian cancer patients.
Due to different omics data providing different views of ovarian cancer, the integration and analysis of multi-omics data can compensate for missing or unreliable information in single-omics data [21]. Therefore, integrating multiple types of omics data can reveal the heterogeneity of ovarian cancer at different molecular levels, providing a more comprehensive understanding of the potential biological processes underlying ovarian cancer development [28,29]. We analyzed and summarized existing methods for integrating multi-omics data, mainly including early fusion and late fusion methods. Early fusion methods combine different types of omics data into a single data and then use deep learning models to learn a latent low dimensional representation, compensating for missing information in single-omics data and contributing to the improvement of predictive performance in cancer prognosis [21,30]. However, due to the high dimensional character of omics data and the large difference in the number of features across different omics modalities, early fusion methods increased the modeling dimensionality of the input data [17], ignoring the heterogeneity among different omics data and the unique distribution of each omics data. Late fusion methods independently learn advanced feature representations for each omics data type and combine the prediction results with uniform data form, which can avoid the challenge of data inconsistency and address the issue of mutual interference between multi-omics data [31,32]. Tong et al. [33] proposed a concatenation autoencoder (ConcatAE) that connects the hidden features learned from each modality for integration, improving overall survival prediction in breast cancer. Hossein et al. [34] proposed a deep neural network based multi-omics late fusion method called MOLI, which individually learns features from each omics data and concatenates the learned features for drug response prediction. Zhou et al. [35] employed a Residual Neural Network for the late fusion of multi-omics data to classify Nottingham Prognostic Index score levels. Zhang et al. [36] introduced a deep learning method called multiGATAE. This method takes patient similarity graphs and feature matrices of different omics data as inputs to separate autoencoders, learning embedded representations for each omics data type. Through a multi-omics attention mechanism, these embeddings are fused into a unified representation used for identifying cancer subtypes. Although these late fusion methods have demonstrated good performance, they overlook the importance of incorporating patient clinical features when learning omics characteristics. Previous research has successfully predicted the risk of acute kidney injury (AKI) using only patient electronic health record data, including clinical information [37]. Therefore, this study adopts a late fusion approach for multi-omics data, utilizing a multi-modal deep neural network to independently learn advanced feature representations of various omics data and clinical data. This strategy avoids interference between different omics data during feature learning and maximizes the utilization of patient information in clinical data, resulting in improved performance in ovarian cancer prognosis prediction.
Multi-omics data explore complementary information from different perspectives between various omics types. In most studies, prediction results from each omics data were averaged, weighted, or linearly fused to extract complementary contextual information. Sun et al. [38] constructed three neural network models for gene expression, copy number alteration (CNA) and clinical data, respectively. They combined the prediction results of these three modalities by weighted linear summation, which improved the prediction performance of breast cancer prognosis. Carrillo-Perez et al. [39] utilized five different types of modal data, including Whole Slide Images (WSIs), RNA-Seq, miRNA-Seq, copy number variation (CNV) and DNA methylation to initialize the prediction of Non-Small-Cell Lung Cancer. They fused the initial probabilities using the weights of the probability-weighted sum to obtain the final probability. All of the above studies have integrated initial prediction results from different data types, but they overlook the inter-correlations between different types of omics data in the feature space, thereby impacting predictive performance [12,20,38]. The view correlated discovery network (VCDN) [40] explores the feature representations of different views in the feature space and uncovers their latent correlations. It synthesizes initial results from specific views and cross-view label-related knowledge for comprehensive predictions. Therefore, we introduce VCDN and proposes cross-modal view correlation discovery network (MACODN), which regards mRNA, DNA methylation, miRNA and CNV as four distinct views of omics data. The MACODN is employed to comprehensively learn the inter-correlations among these different omics data views in the feature space, facilitating a more thorough exploration of feature representations with the ultimate goal of enhancing the predictive outcomes.
In this study, we propose a deep learning model, MDCADON, multi-modal deep neural network and cross-modal view correction discovery network were employed to predict ovarian cancer prognosis by integrating multi-modal data. The MDCADON uses the feature selection method, RLASSO, which removes redundant and noisy features to adequately select genes that are associated with ovarian cancer prognosis. Clinical data are separately combined with four types of omics data (mRNA, DNA methylation, miRNA and CNV) that have undergone RLASSO feature selection, collectively constructing a multi-modal feature space. A multi-modal deep neural network is used to learn advanced feature representations of specific omics data in parallel for the initial prognosis prediction of ovarian cancer. Moreover, cross-modal view correction discovery network (MACODN) is proposed to construct discovery tensor from the initial prediction results, exploring the interrelationships between cross-modal data in the feature space, ultimately achieving the final ovarian cancer prognosis prediction. The experimental results demonstrate that the prognostic prediction performance of the model MDCADON in this study surpasses existing methods for ovarian cancer. This indicates that MDCADON effectively learns high-level feature representations from multi-omics data and clinical data. The MACODN adequately integrates multi-omics data, ultimately enhancing the performance of prognostic prediction for ovarian cancer.
In this paper, the omics data and clinical data for ovarian cancer were downloaded from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). The omics data include mRNA expression, DNA methylation (Methy), miRNA expression and copy number variation (CNV), with feature counts of 46,610, 24,923, 1874 and 24,740, respectively. Detailed information is presented in Table 1. The clinical data described the clinical information of 587 ovarian cancer patients. We selected six critical clinical features, including age, race, Figo stage, ovarian cancer subtype, survival time and survival status [41,42].
Omics type | Number of samples | Number of features | Summary |
mRNA | 367 | 46,610 | HTSeq-FPKM |
DNA methylation | 363 | 24,923 | Illumina Human Methylation 27k |
miRNA | 499 | 1874 | BCGSC Illumina HiSeq |
CNV | 606 | 24,740 | Affymetrix SNP Array 6.0 |
In this paper, we preprocessed the download data. First, Among the ovarian cancer dataset, 367 samples with mRNA expression data, 363 samples with DNA methylation, 499 samples with miRNA expression and 606 samples with CNV. Among the obtained samples, 325 samples had both clinical data and four omics data, which are used as ovarian cancer datasets in this study. Subsequently, we filtered out features with missing values exceeding 20%. Gene expression values that were '0' were converted to 'NA', and the R package "ImputeMissings" [43] was used to impute missing values based on the median.
Due to the large number of features in the omics data, the variance threshold method was applied to select features with a variance, calculated over all patients, higher than the given thresholds [44,45]. We initially determined that the variance thresholds for mRNA, DNA methylation and CNV were 7, 0.02 and 0.1, respectively. The number of features for mRNA, DNA methylation, miRNA and CNV after preprocessing was 8492, 6125,454 and 2274, respectively, with a consistent sample size of 325 across all data types. This information is summarized in Table 2. For the clinical data, the features of race, Figo stage and ovarian cancer subtype were encoded and processed as categorical variables. In this paper, based on the average survival time of patients with 'Alive' as their survival status, we divided patients with survival times lower than the average into the high-risk subgroup, and those with survival times higher than the average into the low-risk subgroup.
Omics type | Number of samples | Number of features | Summary |
mRNA | 325 | 8492 | HTSeq-FPKM |
DNA methylation | 325 | 6125 | Illumina Human Methylation 27k |
miRNA | 325 | 454 | BCGSC Illumina HiSeq |
CNV | 325 | 2274 | Affymetrix SNP Array 6.0 |
As shown in Figure 1, the model MDCADON proposed in this paper consists of four components, including the feature selection method RLASSO, multi-modal deep neural network, cross-modal view correction discovery network (MACODN), as well as survival analysis and enrichment analysis. MDCADON employs the feature selection method RLASSO to choose genes highly correlated with ovarian cancer prognosis. The selected omics data, combined with clinical features, serve as inputs to the multi-modal deep neural network, where high-level feature representations of the omics data are effectively learned. Moreover, the initial prediction results are obtained through fully connected layers. In MACODN, the initial prediction labels of mRNA, DNA methylation, miRNA and CNV are fused using cross-modal view discovery tensor, which enables the final prediction of prognostic risk for patients. Finally, survival analysis is conducted on the patients based on the predictive results, and enrichment Gene Ontology (GO) term analysis and biological pathway analysis are performed on the genes identified by MDCADON.
The multi-omics data of cancer are characterized by low samples and high dimensions, and feature selection methods can effectively capture important features from high-dimensional information, thereby improving the predictive performance of models [21]. As shown in Table 2, after preprocessing, the number of features in mRNA, DNA methylation, miRNA and CNV data are 8492, 6125,454 and 2274, respectively. In order to obtain genes highly correlated with ovarian cancer prognosis, we introduced random forest and proposed the feature selection method RLASSO. The important features selected by random forest were used as the supplement of the dropped features in the LASSO regression, resulting in the final feature numbers for the four types of omics data, as shown in Table 3.
Omics type | Number of samples | Number of features |
mRNA | 325 | 143 |
DNA methylation | 325 | 142 |
miRNA | 325 | 128 |
CNV | 325 | 136 |
In RLASSO, first, LASSO regression achieves feature selection by adding the penalty term in the optimization objective function to the least squares error term through L1-regularization, driving some feature coefficients to approach zero and eventually completely shrinking the coefficients of certain features to zero, consequently achieving feature selection. The formula for feature selection using LASSO regression is as follows:
min∑Nj=1(yij−∑dik=1xjkωk)2+λ∑dik=1||ωk||1 | (2.1) |
subjectto:∑dik=1||ωk||1<c | (2.2) |
where, i represents the i-th omics data, N represents the number of samples, di represents the total number of features in the i-th omics data, yj represents the label of the j-th sample and λ represents the regularization parameter.
Then, random forest is utilized to construct decision trees and rank all features based on their importance. The top K features with high importance are selected according to the feature importance criterion. In the feature set F={f1,f2,f3,…,f(di)} of the omics data, the feature importance set I={I1,I2,I3,…,I(di)} is outputted according to the feature importance, where di represents the total number of features of the i-th omics data, and Ix is calculated as follows:
Ix=1N∑Nk=1(Roobn−Roobnj) | (2.3) |
where N is the number of samples. Roobp and Roobpj represent data outside of the bag before and after the decision tree disturbance (i.e., the samples that were not sampled during decision tree resampling), and the number of correctly classified samples is recorded. In descending order of importance, 𝐾 features with higher importance are selected as filtered features. In this study, the best predictive performance was achieved when K = 100.
Finally, the important features selected by random forest together with those retained in the LASSO regression are used as the total features of the specific omics data, providing richer information and a deeper understanding for further genetic analysis and biological studies.
Due to the heterogeneity between different omics data, early fusion methods caused mutual interference among multi-omics data [34], weakening the feature quality and further affecting the effectiveness of feature learning. Accordingly, we introduce a late fusion method for multi-modal data, in which clinical features are spliced with four types of histological data (mRNA, DNA methylation, miRNA and CNV) that have been selected by the RLASSO feature selection respectively, to form new omics data, i.e., mRNA+clinical, DNA methylation+clinical, miRNA+clinical and CNV +clinical. A multi-modal deep neural network is used to independently learn from each channel of these multi-modal data. Each channel corresponds to a deep neural network dedicated to independently learning advanced feature representations for different types of omics data. This approach effectively addresses the issue of mutual interference among diverse omics data types. The deep neural network is composed of multiple hidden layers, enabling the thorough exploration of specific information within each omics data type to obtain accurate initial predictions for labels. The input data for one of the deep neural networks is shown below:
Xi=Concat(xi1,xi2,…,xip,c1,…c4) | (2.4) |
where, Xi represents the fused total features after integrating the i-th omics data with clinical features, xip represents the p-th feature of the i-th omics data, i = 1, 2, 3, 4 and c1−c4 represent the clinical features. Each omics data is processed by its corresponding deep neural network to learn its high-level feature representation. The deep neural network (DNN) applies non-linear transformations to the inputs through activation functions and passes the results to the next layer. Taking the i-th omics data as an example, a deep neural network with l hidden layers is represented as follows:
Zi1=σ(XiW1+b1) | (2.5) |
Zik=σ(Zik−1Wk−1+bk−1) | (2.6) |
P(y|Xi,θ)=g(ZilWl+bl) | (2.7) |
where y represents the sample labels, θ represents the parameters of all neural networks, Zk(k = 1, 2, …, l)are the results obtained after processing through hidden neurons, and b is the bias vector. The values of Z and W depend on the input dimension, the number of hidden neurons, and the number of classes. Additionally, σ(⋅) represents the LeakyRelu activation function, and g(⋅) represents the softmax function, which transforms the output layer values into prediction probabilities. Therefore, the prediction process for the i-th type of omics data Xi∈Rn×di can be expressed as follows:
ˆYi=P(y|Xi,θ) | (2.8) |
where ˆYi∈Rn×2. We use ˆyij∈R2 to denote the j-th row in ˆYi, which represents the predicted label distribution of the j-th training sample from the i-th omics data type.
In this paper, the DNN consists of 5 fully connected layers, including 1 input layer, 3 hidden layers and 1 output layer. The selection of the number of hidden layers depends on experimental results. After extensive experiments, the best prediction performance is achieved when DNN consists of 3 hidden layers, each with 200,100 and 100 respectively. The loss function for the multi-modal deep neural network can be written as:
LMulti−modalDNN=∑vi=1LiDNN | (2.9) |
LiDNN=∑Nj=1LCE(ˆyij,yj) | (2.10) |
=−1N∑Nj=1[yji·log(ˆyij)+(1−yji)·log(1−ˆyij)] | (2.11) |
where v represents the number of omics types, N represents the number of samples, LCE(⋅) represents the cross-entropy loss function and yj∈R2 is the one-hot encoded label of the j-th sample.
In existing deep learning methods based on integrating multi-omics data, the primary approaches are to connect features from different omics data in the input space or feature space [46,47], ignoring their inter-correlations. Research has demonstrated that VCDN [40] is a simple and effective module for learning cross-view correlations in the label space, particularly in human action recognition tasks. In this paper, we introduce VCDN and propose the cross-modal view correlation discovery network (MACODN) to learn label-level knowledge from multi-omics data. MACODN constructs a spatial discovery tensor to explore the inter-correlations among different omics data and integrates the initial prediction labels from multi-omics data to improve the accuracy of the final prediction results. The MACODN is a deep learning module composed of cross-modal view correlation discovery tensor and a fully connected neural network. The cross-modal view correlation discovery tensor is constructed by spatially fusing the initial prediction labels from four omics data, aiming to comprehensively learn the inter-correlations among different omics data and explore their latent associations. The fully connected neural network is utilized to learn the fused label features, further extracting meaningful characteristics and generating the final prediction results.
In this study, after performing feature selection on multi-omics data using RLASSO, the extracted low-dimensional feature representations are fed into a multimodal deep neural network to generate initial prediction labels. The initial prediction labels for the j-th sample in all omics data sources are represented as ˆyijϵR2, where i=1,2,3,4, and ˆyi denotes the initial prediction matrix for the i-th omics data. The MACODN models the correlation among mRNA, DNA methylation, miRNA and CNV by defining Pj∈R2×2×2×2 as the cross-modal view correlation discovery tensor for the j-th sample, with the following calculation formula:
Pj,abcd=ˆy1j,aˆy2j,bˆy3j,cˆy4j,d | (2.12) |
where ˆyij,x represents the x-entry of yij. Afterward, the cross-modal view correlation discovery tensor Pj is reshaped into a one-dimensional vector pj of length 24. Finally, the vector pj is input into the fully connected network for the final label prediction. Experimental results indicate that the best prediction performance is achieved when the fully connected network consists of 2 hidden layers, each with 100 neurons. MACODN utilizes the cross-entropy loss function for training, which is defined as follows:
LMACODN=∑Nj=1LCE(MACODN(pj),yj) | (2.13) |
=∑Nj=1−log(eMACODN(pj)⋅yj∑2k=1eMACODN(pj)k) | (2.14) |
where MACODN(pj)k denotes the k-th element in the vector MACODN(pj)∈R2.The activation functions in the fully connected network are LeakyRelu and Softmax, with the final output dimension being 2 for prediction results. In summary, the total loss of MDCADON can be expressed as:
L=∑vi=1LiDNN+βLMACODN | (2.15) |
where β is the weighting parameter that balances the prediction loss of specific omics data with the final prediction loss. In this paper, we set β = 1.
In this paper, the model MDCADON was trained on an NVIDIA GeForce RTX 3090 with 24 GB of video memory, implemented using Torch 1.10.0 and Python 3.6.11. During the training process, the learning rate, epoch and batch size were set to 0.01, 1000 and 32, respectively. The Adam algorithm was employed to optimize the objective function. To prevent overfitting, dropout and weight decay (L2 regularization) were implemented to ensure the model's effectiveness. The dropout rate and weight decay rate were set to 0.2 and 0.001, respectively.
In this paper, the performance of MDCADON in predicting ovarian cancer prognosis was evaluated using the following metrics, including accuracy (ACC), F1-score and the area under the receiver operating characteristic curve (AUC). The definition of ACC is as follows:
ACC=TP+TNTP+TN+FP+FN | (2.16) |
where TP, TN, FP, and FN represent true positives, true negatives, false positives and false negatives, respectively. The F1-score is the weighted average of precision and recall, defined as follows:
F1−score=2×precision×recallprecision+recall | (2.17) |
where precision represents the percentage of accurately predicted positive samples out of all positive samples, and recall represents the rate of accurately predicted positive samples out of all accurate positive samples. They are defined as follows:
precision=TPTP+FP | (2.18) |
recall=TPTP+FN | (2.19) |
The AUC represents the area under the ROC curve, and a larger area indicates better predictive performance of the model.
To validate the effectiveness of MDCADON in integrating multi-omics data for ovarian cancer prognosis prediction, we compared MDCADON with seven representative methods commonly used for multi-omics data classification, including 4 machine learning methods K-Nearest Neighbor (KNN) [48], Support Vector Machines (SVM) [49], random forest (RF) [50] and Gradient Boosting Tree (XGBoost) [51], as well as 3 deep learning methods Fully-connected Neural Network (FNN) [52], MOGONET [53] and MOCSC [18]. Among them, label predictions were made by voting of KNN in the training data, with K set to 37 to minimize the error. In RF, multiple decision trees were used for ensemble learning, and the final prediction was obtained through voting. SVM utilized the kernel method to classify the prediction of multi-omics data. XGBoost employed the gradient boosting technique for early and late-stage cancer classification. In FNN, a four-layer fully connected network was trained using cross-entropy loss. In MOGONET, a three-layer GCN network was trained using L2 parametric loss, and VCDN was trained using cross-entropy loss. In MOCSC, a 6-layer sparse denoising autoencoder was used to extract latent variable features, which were then fed into a single-layer neural network for initial predictions, and the final prediction was obtained through VCDN. The methods of KNN, SVM, XGBoost and FNN were trained with the concatenation of multi-omics data. It is worth noting that KNN, SVM and RF mainly utilized the scikit-learn software package and all other parameters were set to default values. XGBoost determined the optimal parameter values through grid search. FNN, MOGONET and MOCSC were implemented using the PyTorch framework and employed the same parameters as our model. In this paper, we randomly selected 70% of the samples as the training set and 30% as the test set. All methods were evaluated on 5 different randomly generated training and test sets. The experimental results are presented in Table 4.
Method | ACC | F1-score | AUC |
KNN | 57.27 ± 2.31 | 47.74±1.89 | 52.26± 2.61 |
SVM | 54.28±1.87 | 53.59±1.66 | 54.43±2.25 |
RF | 55.16±3.49 | 54.17±2.36 | 53.12±2.11 |
XGBoost | 56.04±2.12 | 54.66±1.60 | 54.92±0.87 |
FNN | 59.21±1.69 | 65.84±2.56 | 53.62±2.14 |
MOGONET | 64.25±2.36 | 73.16±2.20 | 57.94±1.98 |
MOCSC | 65.48±1.83 | 73.45±2.31 | 59.25±2.17 |
MDCADON (ours) | 69.47±2.10 | 77.91±1.82 | 63.40± 2.15 |
From Table 4, it can be observed that MDCADON achieved the ACC, F1-score and AUC were 0.6947, 0.7791 and 0.6345, respectively, outperforming other methods in ovarian cancer prognosis prediction and obtaining the best evaluation metrics. Given the small sample size and data imbalance in ovarian cancer, the F1-score, a critical assessment index, holds particular importance in the experimental results. Through analysis, we observed that MDCADON, along with the deep learning methods FNN, MOGONET and MOCSC, all outperformed the machine learning methods. This indicates that deep learning methods possess strong feature learning capabilities, especially in multi-omics data where features are in the order of thousands, allowing them to learn more abstract and high-level feature representations from high-dimensional data. Most notably, the F1-score value of the model MDCADON is improved by 12.07%, 4.75% and 4.46% than FNN, MOGONET and MOCSC, respectively. Through analysis, FNN simply concatenated the predicted labels from multi-omics data, while MDCADON utilized the MACODN for label-level fusion in the initial prediction space, which is able to fully learn the inter-correlation between different omics data. Compared to MOGONET, MDCADON performed feature selection on each type of omics data and incorporates clinical features, employing a multi-modal deep neural network to extract important features from multiple channels, effectively addressing the heterogeneity among different types of omics data. In comparison to MOCSC, MDCADON included clinical features in addition to omics data, leveraging patient clinical information and significantly improving the overall prediction accuracy of the model. Therefore, the superiority of MDCADON in ovarian cancer prognosis prediction tasks was primarily attributed to the feature learning capability of the multi-modal deep neural network and the effective integration of multi-omics data by MACODN. Additionally, the model MDCADON made full use of clinical feature information, enabling more accurate prognostic predictions for ovarian cancer.
To assess the contribution of mRNA, DNA methylation, miRNA and CNV in ovarian cancer prognosis prediction, we investigated the predictive performance of different types of omics data. We compared the prediction performance of MDCADON with four types of omics data (mRNA+methy+miRNA+CNV for combining mRNA, DNA methylation, miRNA and Copy Number Variance), three types of omics data, two types of omics data and a single type of omics data. The results are presented in Figure 2. It is worth noting that the MACODN module in the model MDCADON, integrated multi-omics data through a cross-omics discovery tensor, while using a single type of omics data did not involve processing by MACODN.
Figure 2 illustrates that integrating four types of omics data in the prognosis prediction performed better than integrating three or two types of omics data. Similarly, the integration of three types of omics data achieved higher ACC values compared to integrating two types of omics data. This indicates the presence of complementary information between different types of omics data, and incorporating more types of omics data into the model enhances the accuracy of prediction results. In terms of F1-score, integrating three types of omics data outperformed integrating two types of omics data in most cases, except for mRNA+miRNA, which performed comparably with integrating three types of omics data. This once again validates the complementary nature of different types of omics data. Overall, the more types of omics data integrated, the more accurate the prediction results for ovarian cancer patients. Moreover, it can be observed that the combinations of methy+miRNA, methy+CNV and miRNA+CNV have relatively lower F1-scores. However, upon adding mRNA expression data to these combinations to form three types of omics data, the F1-scores improved by 29.60%, 21.24% and 20.91%, respectively. This suggests that there is a close correlation between mRNA expression and miRNA expression, DNA methylation and CNV data, and that the addition of mRNA has a significant effect on enhancing the prediction performance [54,55,56]. When all types of omics data were integrated, the analysis of F1-scores revealed that mRNA contributes the most among the four types of omics data, followed by miRNA, then DNA methylation and finally CNV. This suggests that mRNA plays a pivotal role in ovarian cancer, while miRNA, DNA methylation and CNV data also make meaningful contributions to the prediction results [57]. Therefore, integrating multiple types of omics data positively impacts ovarian cancer prognosis prediction, with the complementary information between different types of omics data improving model performance. Simultaneously, the contribution of different types of omics data in the prediction process provides crucial guidance for effectively utilizing multiple types of omics data for ovarian cancer prognosis prediction.
Due to the small sample size and high feature dimensionality of multi-omics data in ovarian cancer, directly using deep learning methods for prognosis prediction may lead to suboptimal results. To address this issue, researchers applied feature selection or dimensionality reduction techniques [58], among which the commonly used methods include PCA [59], RF [27] and LASSO regression [25]. In this study, to validate the effectiveness of RLASSO, the feature selection method RLASSO in MDCADON was replaced by other methods, including no feature selection (NoFeaSelec), PCA, RF and LASSO regression. The experimental results are shown in Figure 3. Specifically, NoFeaSelec_MACODN, PCA_MACODN, RF_MACODN and LASSO_MACODN represent the integration of multi-omics data without feature selection, with PCA dimensionality reduction, with RF feature selection and with LASSO regression, respectively, followed by cross-modal data fusion using MACODN for ovarian cancer prognosis prediction.
As shown in Figure 3, utilizing RLASSO for feature selection resulted in ACC, F1-score and AUC values of 69.47%, 77.91% and 63.45%, respectively, all outperforming the other models. Compared to NoFeaSelec_MACODN, MDCADON achieved improvements of 11.07%, 14.88% and 3.27% in ACC, F1-score and AUC, respectively. Furthermore, in comparison to PCA_MACODN, RF_MACODN and LASSO_MACODN, MDCADON improved the F1-score by 22.28%, 14.62% and 15.98%, respectively. This indicates that RLASSO selected important genes related to ovarian cancer, contributing to the enhanced predictive performance of the model. In addition, the genes selected by RLASSO can be further analyzed for their biological functions, signaling pathways and expression in other cancers or diseases, thereby expanding the potential biological functions and clinical applications of these genes.
To assess whether the MACODN module in the MDCADON model effectively captures the inter-correlation of different omics data in the feature space and improves the prediction performance of ovarian cancer prognosis, we removed the MACODN module and used the fully connected network with the same number of layers as MACODN directly connecting the initial predicted label distribution for final prediction, i.e., RLASSO_FNN. Additionally, we combined the no-feature-selection and feature-selection methods PCA, RF and LASSO regression with the FNN, respectively, i.e., NoFeaSelec_ FNN, PCA_FNN, RF_ FNN and LASSO_FNN to further verify the effectiveness of MACODN. The experimental results are shown in Figure 4.
Comparing the five models mentioned above, MDCADON achieved the highest ACC, F1-score and AUC values. In comparison to RLASSO_FNN, MDCADON utilized the MACODN to perform spatial-level integration of multi-omics data using cross-modal discovery tensors, effectively capturing the correlations among different omics data in the feature space. This resulted in significant improvements of 6.26%, 8.13% and 2.82% in ACC, F1-score and AUC, respectively. These results demonstrate that MACODN facilitates better capturing of relevant information among multi-omics data and learning useful features from cross-modal discovery tensors, achieving better performance in ovarian cancer prognosis prediction. Furthermore, compared with NoFeaSelec_FNN, PCA_FNN, RF_FNN and LASSO_FNN, RLASSO_FNN showed further improvements of 9.96%, 10.73%, 6.55% and 10.72% in F1-score, respectively, further affirming the effectiveness of RLASSO in selecting relevant feature genes that contribute to ovarian cancer patient prognosis assessment.
The purpose of this experiment is to validate the significance of incorporating clinical features into the multi-modal deep neural network for ovarian cancer prognosis prediction. The MDCADON aims to evaluate the predictive performance by comparing the inclusion or exclusion of clinical features in the input data of the multi-modal deep neural network. The experimental results, shown in Figure 5, compared the impact of the presence or absence of clinical features in the input data on the prognosis prediction performance among different models.
Among the 10 models mentioned above, the incorporation of clinical features into the input multi-omics data demonstrated higher evaluation results in terms of ACC, F1-score and AUC. For the NoFeaSelec_FNN model, when the input omics data was not subjected to feature selection, the performance improvement by adding clinical features was not significant. The main reason is that the omics data without feature selection contains redundant features, and the clinical features are considered redundant and irrelevant information, leading to their importance being overlooked. However, after adding clinical features to the model MDCADON, the ACC, F1-score and AUC were improved by 1.42%, 4.40% and 1.97%, respectively, which suggests that clinical data, as complementary features to the omics data, provides patient-specific information and plays a role in enhancing the performance of prognosis prediction [38,60]. In summary, for the task of ovarian cancer prognosis prediction, fusing clinical features with multi-omics data in the multi-modal deep neural network enables better learning of diverse feature representations among different omics data, resolving the heterogeneity issue among them. Moreover, the integration of multi-modal data provides comprehensive information, which can provide reliable decision-making support for patient prognosis assessment [61].
To further validate the effectiveness of MDCADON, the GEO datasets for ovarian cancer were downloaded, including GSE26712 [62], GSE32062 [63], GSE17260 [64] and GSE140082 [65], with detailed information shown in Table 5. We applied the same preprocessing procedures as the TCGA dataset and categorized samples into high-risk and low-risk subgroups based on the average survival time. The GEO datasets were randomly divided into a 70% training set and a 30% testing set, and the experiments were repeated 5 times. During the experiments, the model MDCADON was compared with other methods, and the results are presented in Figure 6.
From Figure 6, it is evident that, in the GSE32062 and GSE140082 datasets, the predictive performance of MDCADON was significantly superior to traditional machine learning methods, such as KNN, SVM, RF and XGBoost. When compared to deep learning methods like FNN, MOGONET and MOCSC, MDCADON also exhibited relatively better performance. However, despite the modest improvement in the metrics ACC, F1-score and AUC, this can be attributed to the fact that the GEO datasets only contain one type of omics data. In such cases, the module MACODN did not play a significant role as it relies on the integration of multiple types of omics data. Regarding the GSE26712 and GSE17260 datasets, MDCADON also significantly outperformed the machine learning methods, but was comparable to the results of the deep learning methods MOGONET and MOCSC. This can be attributed to the small sample sizes of ovarian cancer in the GSE26712 and GSE17260 datasets, leading to overfitting of the models [66,67]. Through experimental validation on the GEO datasets, MDCADON has demonstrated good predictive performance and generalizability. Although it may not show a significant advantage in certain datasets, it performs well in other datasets, indicating the effectiveness of MDCADON for ovarian cancer prognosis prediction tasks.
Datasets | Sample numbers | Data category | Gene annotation platform |
GSE26712 | 185 | RNA-seq | GPL96 Affymetrix |
GSE32062 | 260 | Gene expression | GPL6480 Agilent |
GSE17260 | 110 | Gene expression | GPL6480 Agilent |
GSE140082 | 380 | Gene expression | GPL14951 Illumina |
In ovarian cancer prognosis prediction, identifying genes that are associated with patient survival is crucial for treatment and prognosis. To determine the key genes, we used the permutation importance method [68] to rank and select genes. Since the input omics data were normalized to [-1, 1] during preprocessing, we set genes to 0 to assess their importance in predicting ovarian cancer patient survival. By setting the genes in the test set to 0 and calculating the decrease in predictive performance compared to using all genes for prediction, we analyzed the contribution of each gene. The genes with the largest performance decrease were considered the most important genes. As the training process is stochastic, we repeated the experiments 5 times and finally selected the top 20 genes highly correlated with ovarian cancer prognosis, as detailed in Table 6.
Omics data type | Genes |
mRNA | NOL7, PPP3CA, PUF60, CXCL9, ZNF561, CXCL14, USP14, ADH1B, UTP11, PARL, PBK, NRGN, SCNN1A, POLD2, POLR1C, SAR1A, RAB12, NRAS, ZNF826P |
CNV | ACTR3 |
By literature review, we found that some of the genes identified by MDCADON are closely related to ovarian cancer. Among them, CXCL9 has been confirmed to be associated with ovarian cancer prognosis. It is a prognostic marker for patients with late-stage High-Grade Serous Ovarian Cancer (HGSOC) and is associated with improved survival [69]. The protein encoded by ADH1B is a member of the alcohol dehydrogenase family, and it has been proven to promote the mesothelial clearance and ovarian cancer infiltration [70]. Additionally, CXCL14 exhibits abnormally high expression in the serum and ovarian tissues of ovarian cancer patients, correlating with poor prognosis. It is identified as a potential new adjuvant marker for early diagnosis [71]. In this paper, the MDCADON selected important genes associated with ovarian cancer prognosis by ranking the gene importance for all omics data. Research has shown that mRNA expression is the most critical in ovarian cancer multi-omics data [57]. Therefore, among the top 20 selected important genes, 19 are related to mRNA expression, further validating the importance of mRNA in ovarian cancer prognosis prediction.
In this study, we utilized the "survival" package in R to generate Kaplan-Meier survival curves, leveraging the MDCADON predictions for high-risk and low-risk subgroups of ovarian cancer patients. This analysis was conducted using the TCGA dataset and the GEO datasets (GSE26712, GSE32062, GSE17260 and GSE140082), as shown in Figure 7 andFigure 8. It can be observed that there are significant differences between the two risk subgroups predicted by MDCADON (p < 0.05, indicating statistical significance). Specifically, as depicted in Figure 7 for the TCGA dataset, the p-value of the survival curve was 0.0071, demonstrating significant differences in survival rates between the subgroups predicted by MDCADON. Similarly, as illustrated in Figure 8 for the GSE26712, GSE32062, GSE17260 and GSE140082 datasets, the p-values were 0.0055, 0.025, 0.017 and 0.028, respectively, indicating that MDCADON exhibits good generalizability in the GEO datasets, and there are significant differences in survival risks between the subgroups.
To further validate the relevance of the selected important genes to ovarian cancer, survival analysis was conducted by combining the clinical data of ovarian cancer patients. In the experiment, the median expression value of each gene served as the dividing line, categorizing values above the median as the high-risk subgroup and values below the median as the low-risk subgroup [72]. Kaplan-Meier survival curves were plotted to visualize the survival outcomes over time, as shown in Figure 9. The results revealed a decrease in the survival rate of patients with ovarian cancer with the passage of time. Eighteen important genes were included in Figure 9, and it can be observed that the survival time of patients in the high-risk subgroup of these genes is generally lower than that of the low-risk subgroup, with all survival time differences being below 0.05. The results indicate that the clinical significance of the selected important genes in predicting the survival of ovarian cancer patients. The identified genes may serve as potential biomarkers for ovarian cancer patient prognosis, offering valuable insights for further research into the underlying mechanisms of ovarian cancer pathogenesis, prognostic factors and personalized treatment approaches.
In this paper, the identified genes were subjected to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis using the online tool Metascape, as shown in Figure 10. Additionally, a functional enrichment clustering network of the genes was generated, as depicted in Figure 11.
These analyses revealed several important pathways related to ovarian cancer, providing valuable insights for researchers to better understand the mechanisms underlying ovarian cancer initiation, progression and treatment. Among them, vesicle-mediated transport regulation and human papillomavirus infection are associated with ovarian cancer development and metastasis. Extracellular vesicles are known to play a significant role in cell-to-cell communication and have been implicated in tumor formation and metastatic disease [73]. In addition, the identified pathway for human papillomavirus infection has been proven to be highly associated with ovarian cancer [74]. The Hippo signaling pathway, a highly conserved regulator of organ size control, also plays a crucial role in ovarian physiological regulation. Dysregulation of the Hippo pathway has been linked to the loss of follicular homeostasis and reproductive disorders, including polycystic ovary syndrome (PCOS), premature ovarian insufficiency and ovarian cancer [75]. In addition to pathways specifically related to ovarian cancer, the analysis identified pathways associated with other cancers and diseases, such as liver hepatocellular carcinoma and Parkinson's disease [57]. This discovery sheds light on the potential interconnections between ovarian cancer and other diseases, providing essential clues for further research into the pathophysiology of ovarian cancer.
In this paper, we proposed a deep learning model, MDCADON, using multi-omics data and cross-modal view collection discovery network for prognostic prediction of ovarian cancer patients. MDCADON employed the feature selection method RLASSO to select relevant features from multi-omics data. The clinical features were fused with the RLASSO-screened omics data as inputs to a multi-modal deep neural network, which fully learns high-level feature representations of the multi-omics data for the initial prediction of survival subgroups of ovarian cancer patients. Finally, the cross-modal view association discovery network was applied to spatially integrate the predicted labels from different omics data, facilitating the learning of inter-correlations between diverse types of data for the final prediction. The results demonstrated that the proposed model MDCADON outperformed existing methods and exhibited favorable performance in ovarian cancer prognosis prediction. Simultaneously, Kaplan-Meier survival curves display significant differences among the survival subgroups predicted by MDCADON for ovarian cancer patients. The visual results of GO/KEGG enrichment analysis further validate the importance of genes identified by MDCADON in ovarian cancer prognosis prediction, which provides crucial support and guidance for a deeper understanding of the pathogenesis of ovarian cancer and the exploration of new treatment strategies.
Although the model proposed in this paper shows promising performance in ovarian cancer prognosis prediction, there are certain limitations. Due to phenomena such as gene mutations and genetic recombination, it is not always apparent if they are clinically relevant. To be useful in clinical decision-making, the model still requires extensive testing, interpretability analysis and uncertainty measurement. Besides, the MDCADON model is limited by the number of ovarian cancer samples, which diminishes the reliability of prognosis prediction for ovarian cancer patients. In future work, we will explore more visualization methods to interpret the model's predictions from multiple perspectives, deeply elucidating the potential pathogenesis of ovarian cancer and providing further support for its application in clinical practice. In addition, we will consider incorporating histopathological images of ovarian cancer as input. By fully leveraging the information from image data, we aim to conduct a comprehensive study of ovarian cancer patients, offering a novel research approach for ovarian cancer prognosis prediction.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
This research was funded by the National Natural Science Foundation of China, grant number 62176177; and the Natural Science Foundation of Shanxi Province, grant number 202203021211121.
The authors declare that there are no conflicts of interest.
[1] |
M. Kossai, A. Leary, J. Y. Scoazec, C. Genestie, Ovarian cancer: A heterogeneous disease, Pathobiology, 85 (2018), 41–49. https://doi.org/10.1159/000479006 doi: 10.1159/000479006
![]() |
[2] |
Y. Xiao, M. Bi, H. Guo, M. Li, Multi-omics approaches for biomarker discovery in early ovarian cancer diagnosis, EBioMedicine, 79 (2022), 104001. https://doi.org/10.1016/j.ebiom.2022.104001 doi: 10.1016/j.ebiom.2022.104001
![]() |
[3] |
P. E. Colombo, M. Fabbro, C. Theillet, F. Bibeau, P. Rouanet, I. Ray-Coquard, Sensitivity and resistance to treatment in the primary management of epithelial ovarian cancer, Crit. Rev. Oncol. Hematol., 89 (2014), 207–216. https://doi.org/10.1016/j.critrevonc.2013.08.017 doi: 10.1016/j.critrevonc.2013.08.017
![]() |
[4] |
R. Hu, X. Wang, X. Zhan, Multi-parameter systematic strategies for predictive, preventive and personalised medicine in cancer, EPMA J., 4 (2013), 1–12. https://doi.org/10.1186/1878-5085-4-2 doi: 10.1186/1878-5085-4-2
![]() |
[5] |
T. Cheng, X. Zhan, Pattern recognition for predictive, preventive, and personalized medicine in cancer, EPMA J., 8 (2017), 51–60. https://doi.org/10.1007/s13167-017-0083-9 doi: 10.1007/s13167-017-0083-9
![]() |
[6] |
X. Zhan, Y. Long, M. Lu, Exploration of variations in proteome and metabolome for predictive diagnostics and personalized treatment algorithms: Innovative approach and examples for potential clinical application, J. Proteomics, 188 (2018), 30–40. https://doi.org/10.1016/j.jprot.2017.08.020 doi: 10.1016/j.jprot.2017.08.020
![]() |
[7] |
C. Denkert, J. Budczies, T. Kind, W. Weichert, P. Tablack, J. Sehouli, et al., Mass spectrometry-based metabolic profiling reveals different metabolite patterns in invasive ovarian carcinomas and ovarian borderline tumors, Cancer Res., 66 (2006), 10795–10804. https://doi.org/10.1158/0008-5472.CAN-06-0755 doi: 10.1158/0008-5472.CAN-06-0755
![]() |
[8] |
R. Sabatier, P. Finetti, N. Cervera, D. Birnbaum, F. Bertucci, Gene expression profiling and prediction of clinical outcome in ovarian cancer, Crit. Rev. Oncol. Hematol., 72 (2009), 98–109. https://doi.org/10.1016/j.critrevonc.2009.01.007 doi: 10.1016/j.critrevonc.2009.01.007
![]() |
[9] |
A. Ghose, S. V. N. Gullapalli, N. Chohan, A. Bolina, M. Moschetta, E. Rassy, et al., Applications of proteomics in ovarian cancer: Dawn of a new era, Proteomes, 10 (2022), 16. https://doi.org/10.3390/proteomes10020016 doi: 10.3390/proteomes10020016
![]() |
[10] |
B. Arjmand, S. K. Hamidpour, A. Tayanloo-Beik, P. Goodarzi, H. R. Aghayan, H. Adibi, et al., Machine learning: A new prospect in multi-omics data analysis of cancer, Front. Genet., 13 (2022), 824451. https://doi.org/10.3389/fgene.2022.824451 doi: 10.3389/fgene.2022.824451
![]() |
[11] |
H. Feng, Z. Y. Gu, Q. Li, Q. H. Liu, X. Y. Yang, J. J. Zhang, Identification of significant genes with poor prognosis in ovarian cancer via bioinformatical analysis, J. Ovarian Res., 12 (2019), 1–9. https://doi.org/10.1186/s13048-019-0508-2 doi: 10.1186/s13048-019-0508-2
![]() |
[12] |
K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, D. I. Fotiadis, Machine learning applications in cancer prognosis and prediction, Comput. Struct. Biotechnol. J., 13 (2015), 8–17. https://doi.org/10.1016/j.csbj.2014.11.005 doi: 10.1016/j.csbj.2014.11.005
![]() |
[13] | L. Wang, Y. Li, J. Zhou, D. Zhu, J. Ye, Multi-task survival analysis, in 2017 IEEE International Conference on Data Mining (ICDM), (2017), 485–494. https://doi.org/10.1109/ICDM.2017.58 |
[14] |
C. Stirzaker, E. Zotenko, J. Z. Song, W. Qu, S. S. Nair, W. J. Locke, et al., Methylome sequencing in triple-negative breast cancer reveals distinct methylation clusters with prognostic value, Nat. Commun., 6 (2015), 5899. https://doi.org/10.1038/ncomms6899 doi: 10.1038/ncomms6899
![]() |
[15] |
S. Volinia, C. M. Croce, Prognostic microRNA/mRNA signature from the integrated analysis of patients with invasive breast cancer, Proc. Natl. Acad. Sci., 110 (2013), 7413–7417. https://doi.org/10.1073/pnas.1304977110 doi: 10.1073/pnas.1304977110
![]() |
[16] |
Y. Wu, H. Chen, G. Jiang, Z. Mo, D. Ye, M. Wang, et al., Genome-wide association study (GWAS) of germline copy number variations (CNVs) reveal genetic risks of prostate cancer in Chinese population, J. Cancer, 9 (2018), 923–928. https://doi.org/10.7150/jca.22802 doi: 10.7150/jca.22802
![]() |
[17] |
P. Gong, L. Cheng, Z. Zhang, A. Meng, E. Li, J. Chen, et al., Multi-omics integration method based on attention deep learning network for biomedical data classification, Comput. Methods Programs Biomed., 231 (2023), 107377. https://doi.org/10.1016/j.cmpb.2023.107377 doi: 10.1016/j.cmpb.2023.107377
![]() |
[18] | Y. Ma, J. Guan, MOCSC: A multi-omics data based framework for cancer subtype classification, in 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), (2022), 2853–2859. https://doi.org/10.1109/BIBM55620.2022.9995564 |
[19] |
S. Moon, H. Lee, MOMA: A multi-task attention learning algorithm for multi-omics data interpretation and classification, Bioinformatics, 38 (2022), 2287–2296. https://doi.org/10.1093/bioinformatics/btac080 doi: 10.1093/bioinformatics/btac080
![]() |
[20] |
H. Yang, R. Chen, D. Li, Z. Wang, Subtype-GAN: A deep learning approach for integrative cancer subtyping of multi-omics data, Bioinformatics, 37 (2021), 2231–2237. https://doi.org/10.1093/bioinformatics/btab109 doi: 10.1093/bioinformatics/btab109
![]() |
[21] |
Y. Hu, L. Zhao, Z. Li, X. Dong, T. Xu, Y. Zhao, Classifying the multi-omics data of gastric cancer using a deep feature selection method, Expert Syst. Appl., 200 (2022), 116813. https://doi.org/10.1016/j.eswa.2022.116813 doi: 10.1016/j.eswa.2022.116813
![]() |
[22] |
B. W. Yuan, Z. L. Zhang, X. G. Luo, Y. Yu, X. H. Zou, X. D. Zou, OIS-RF: A novel overlap and imbalance sensitive random forest, Eng. Appl. Artif. Intell., 104 (2021), 104355. https://doi.org/10.1016/j.engappai.2021.104355 doi: 10.1016/j.engappai.2021.104355
![]() |
[23] |
M. Mohammed, H. Mwambi, I. B. Mboya, M. K. Elbashir, B. Omolo, A stacking ensemble deep learning approach to cancer type classification based on TCGA data, Sci. Rep., 11 (2021), 15626. https://doi.org/10.1038/s41598-021-95128-x doi: 10.1038/s41598-021-95128-x
![]() |
[24] |
G. Xie, C. Dong, Y. Kong, J. F. Zhong, M. Li, K. Wang, Group lasso regularized deep learning for cancer prognosis from multi-omics and clinical features, Genes, 10 (2019), 240. https://doi.org/10.3390/genes10030240 doi: 10.3390/genes10030240
![]() |
[25] |
R. Jain, W. Xu, HDSI: High dimensional selection with interactions algorithm on feature selection and testing, PLoS One, 16 (2021), e0246159. https://doi.org/10.1371/journal.pone.0246159 doi: 10.1371/journal.pone.0246159
![]() |
[26] |
Z. Y. Algamal, M. H. Lee, Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification, Expert Syst. Appl., 42 (2015), 9326–9332. https://doi.org/10.1016/j.eswa.2015.08.016 doi: 10.1016/j.eswa.2015.08.016
![]() |
[27] | M. T. Uddin, M. A. Uddiny, A guided random forest based feature selection approach for activity recognition, in 2015 International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), (2015), 1–6. https://doi.org/10.1109/ICEEICT.2015.7307376 |
[28] |
The cancer genome atlas research network, Integrated genomic analyses of ovarian carcinoma, Nature, 474 (2011), 609–615. https://doi.org/10.1038/nature10166 doi: 10.1038/nature10166
![]() |
[29] |
L. Geistlinger, S. Oh, M. Ramos, L. Schiffer, R. S. LaRue, C. M. Henzler, et al., Multiomic analysis of subtype evolution and heterogeneity in high-grade serous ovarian carcinoma, Cancer Res., 80 (2020), 4335–4345. https://doi.org/10.1158/0008-5472.CAN-20-0521 doi: 10.1158/0008-5472.CAN-20-0521
![]() |
[30] |
H. Chai, X. Zhou, Z. Y. Zhang, J. H. Rao, H. Y. Zhao, Y. D. Yang, Integrating multi-omics data through deep learning for accurate cancer prognosis prediction, Comput. Biol. Med., 134 (2021), 104481. https://doi.org/10.1016/j.compbiomed.2021.104481 doi: 10.1016/j.compbiomed.2021.104481
![]() |
[31] |
M. Picard, M. P. Scott-Boyer, A. Bodein, O. Perin, A. Droit, Integration strategies of multi-omics data for machine learning analysis, Comput. Struct. Biotechnol. J., 19 (2021), 3735–3746. https://doi.org/10.1016/j.csbj.2021.06.030 doi: 10.1016/j.csbj.2021.06.030
![]() |
[32] |
N. Adossa, S. Khan, K. T. Rytkonen, L. L. Elo, Computational strategies for single-cell multi-omics integration, Comput. Struct. Biotechnol. J., 19 (2021), 2588–2596. https://doi.org/10.1016/j.csbj.2021.04.060 doi: 10.1016/j.csbj.2021.04.060
![]() |
[33] |
L. Tong, J. Mitchel, K. Chatlin, M. D. Wang, Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis, BMC Med. Inf. Decis. Making, 20 (2020), 1–12. https://doi.org/10.1186/s12911-020-01225-8 doi: 10.1186/s12911-020-01225-8
![]() |
[34] |
H. Sharifi-Noghabi, O. Zolotareva, C. C. Collins, M. Ester, MOLI: Multi-omics late integration with deep neural networks for drug response prediction, Bioinformatics, 35 (2019), i501–i509. https://doi.org/10.1093/bioinformatics/btz318 doi: 10.1093/bioinformatics/btz318
![]() |
[35] |
L. Zhou, M. Rueda, A. Alkhateeb, Classification of breast cancer nottingham prognostic index using high-dimensional embedding and residual neural network, Cancers, 14 (2022), 934. https://doi.org/10.3390/cancers14040934 doi: 10.3390/cancers14040934
![]() |
[36] |
G. Zhang, Z. Peng, C. Yan, J. Wang, J. Luo, H. Luo, MultiGATAE: A novel cancer subtype identification method based on multi-omics and attention mechanism, Front. Genet., 13 (2022), 855629. https://doi.org/10.3389/fgene.2022.855629 doi: 10.3389/fgene.2022.855629
![]() |
[37] |
Y. Hu, K. Liu, K. Ho, D. Riviello, J. Brown, A. R. Chang, et al., A simpler machine learning model for acute kidney injury risk stratification in hospitalized patients, J. Clin. Med., 11 (2022), 5688. https://doi.org/10.3390/jcm11195688 doi: 10.3390/jcm11195688
![]() |
[38] |
D. Sun, M. Wang, A. Li, A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data, IEEE/ACM Trans. Comput. Biol. Bioinf., 16 (2018), 841–850. https://doi.org/10.1109/TCBB.2018.2806438 doi: 10.1109/TCBB.2018.2806438
![]() |
[39] |
F. Carrillo-Perez, J. C. Morales, D. Castillo-Secilla, O. Gevaert, I. Rojas, L. J. Herrera, Machine-learning-based late fusion on multi-omics and multi-scale data for non-small-cell lung cancer diagnosis, J. Pers. Med., 12 (2022), 601. https://doi.org/10.3390/jpm12040601 doi: 10.3390/jpm12040601
![]() |
[40] | L. Wang, Z. Ding, Z. Tao, Y. Liu, Y. Fu, Generative multi-view human action recognition, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 6221–6230. https://doi.org/10.1109/ICCV.2019.00631 |
[41] | L. A. V. Silva, K. Rohr, Pan-cancer prognosis prediction using multimodal deep learning, in Proceeding of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), (2020), 568–571. http://doi.org/10.1109/ISBI45749.2020.9098665 |
[42] |
Z. Fan, Z. Jiang, H. Liang, C. Han, Pancancer survival prediction using a deep learning architecture with multimodal representation and integration, Bioinf. Adv., 3 (2023), vbad006. https://doi.org/10.1093/bioadv/vbad006 doi: 10.1093/bioadv/vbad006
![]() |
[43] |
N. Bokde, F. Martinez-Alvarez, M. W. Beck, K. Kulat, A novel imputation methodology for time series based on pattern sequence forecasting, Pattern Recognit. Lett., 116 (2018), 88–96. https://doi.org/10.1016/j.patrec.2018.09.020 doi: 10.1016/j.patrec.2018.09.020
![]() |
[44] | M. Al Fatih Abil Fida, T. Ahmad, M. Ntahobari, Variance threshold as early screening to Boruta feature selection for intrusion detection system, in 2021 13th International Conference on Information & Communication Technology and System (ICTS), (2021), 46–50. https://doi.org/10.1109/ICTS52701.2021.9608852 |
[45] | L. A. V. Silva, K. Rohr, Pan-cancer prognosis prediction using multimodal deep learning, in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), (2020), 568–571. https://doi.org/10.1109/ISBI45749.2020.9098665 |
[46] | L. Zhou, L. Wang, Q. Wang, Y. Shi, Machine Learning in Medical Imaging, Springer Cham, 2015. https://doi.org/10.1007/978-3-319-24888-2 |
[47] |
X. Zhang, Y. Yang, T. Li, Y. Zhang, H. Wang, H. Fujita, CMC: A consensus multi-view clustering model for predicting Alzheimer's disease progression, Comput. Methods Programs Biomed., 199 (2021), 105895. https://doi.org/10.1016/j.cmpb.2020.105895 doi: 10.1016/j.cmpb.2020.105895
![]() |
[48] | O. Kramer, K-nearest neighbors, in Dimensionality Reduction with Unsupervised Nearest Neighbors, Springer, Berlin, Heidelberg, (2013), 13–23. https://doi.org/10.1007/978-3-642-38652-7_2 |
[49] |
Z. Huang, X. Zhan, S. Xiang, T. S. Johnson, B. Helm, C. Y. Yu, et al., SALMON: Survival analysis learning with multi-omics neural networks on breast cancer, Front. Genet., 10 (2019), 166. https://doi.org/10.3389/fgene.2019.00166 doi: 10.3389/fgene.2019.00166
![]() |
[50] |
S. J. Rigatti, Random forest, J. Insur. Med., 47 (2017), 31–39. https://doi.org/10.17849/insm-47-01-31-39.1 doi: 10.17849/insm-47-01-31-39.1
![]() |
[51] |
B. Ma, F. Meng, G. Yan, H. Yan, B. Chai, F. Song, Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data, Comput. Biol. Med., 121 (2020), 103761. https://doi.org/10.1016/j.compbiomed.2020.103761 doi: 10.1016/j.compbiomed.2020.103761
![]() |
[52] |
D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors, Nature, 323 (1986), 533–536. https://doi.org/10.1038/323533a0 doi: 10.1038/323533a0
![]() |
[53] |
T. Wang, W. Shao, Z. Huang, H. Tang, J. Zhang, Z. Ding, et al., MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification, Nat. Commun., 12 (2021), 3445. https://doi.org/10.1038/s41467-021-23774-w doi: 10.1038/s41467-021-23774-w
![]() |
[54] |
D. B. Seal, V. Das, S. Goswami, R. K. De, Estimating gene expression from DNA methylation and copy number variation: A deep learning regression model for multi-omics integration, Genomics, 112 (2020), 2833–2841. https://doi.org/10.1016/j.ygeno.2020.03.021 doi: 10.1016/j.ygeno.2020.03.021
![]() |
[55] |
Z. Ali Syeda, S. S. S. Langden, C. Munkhzul, M. Lee, S. J Song. Regulatory mechanism of microRNA expression in cancer, Int. J. Mol. Sci., 21 (2020), 1723. https://doi.org/10.3390/ijms21051723 doi: 10.3390/ijms21051723
![]() |
[56] |
S. Ghafouri-Fard, H. Shoorei, M. Taheri, miRNA profile in ovarian cancer, Exp. Mol. Pathol., 113 (2020), 104381. https://doi.org/10.1016/j.yexmp.2020.104381 doi: 10.1016/j.yexmp.2020.104381
![]() |
[57] |
L. Y. Guo, A. H. Wu, Y. X. Wang, L. P. Zhang, H. Chai, X. F. Liang, Deep learning-based ovarian cancer subtypes identification using multi-omics data, Biodata Min., 13 (2020), 1–12. https://doi.org/10.1186/s13040-020-00222-x doi: 10.1186/s13040-020-00222-x
![]() |
[58] |
S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, W. Xu, Applications of support vector machine (SVM) learning in cancer genomics, Cancer Genomics Proteomics, 15 (2018), 41–51. https://doi.org/10.21873/cgp.20063 doi: 10.21873/cgp.20063
![]() |
[59] |
H. Abdi, L. J. Williams, Principal component analysis, WIREs Comput. Stat., 2 (2010), 433–459. https://doi.org/10.1002/wics.101 doi: 10.1002/wics.101
![]() |
[60] |
T. H. Vo, G. S. Lee, H. J. Yang, I. J. Oh, S. H. Kim, S. R. Kang, Survival prediction of lung cancer using small-size clinical data with a multiple task variational autoencoder, Electronics, 10 (2021), 1396. https://doi.org/10.3390/electronics10121396 doi: 10.3390/electronics10121396
![]() |
[61] |
S. R. Choi, M. Lee, Estimating the prognosis of low-grade glioma with gene attention using multi-omics and multi-modal schemes, Biology, 11 (2022), 1462. https://doi.org/10.3390/biology11101462 doi: 10.3390/biology11101462
![]() |
[62] |
T. Bonome, D. A. Levine, J. Shih, M. Randonovich, C. A. Pise-Masison, F. Bogomolniy, et al., A gene signature predicting for survival in suboptimally debulked patients with ovarian cancer, Cancer Res., 68 (2008), 5478–5486. https://doi.org/10.1158/0008-5472.CAN-07-6595 doi: 10.1158/0008-5472.CAN-07-6595
![]() |
[63] |
K. Yoshihara, T. Tsunoda, D. Shigemizu, H. Fujiwara, M. Hatae, H. Fujiwara, et al., High-risk ovarian cancer based on 126-gene expression signature is uniquely characterized by downregulation of antigen presentation pathway, Clin. Cancer Res., 18 (2012), 1374–1385. https://doi.org/10.1158/1078-0432.CCR-11-2725 doi: 10.1158/1078-0432.CCR-11-2725
![]() |
[64] |
K. Yoshihara, A. Tajima, T. Yahata, S. Kodama, H. Fujiwara, M. Suzuki, et al., Gene expression profile for predicting survival in advanced-stage serous ovarian cancer across two independent datasets, PLoS One, 5 (2010), e9615. https://doi.org/10.1371/journal.pone.0009615 doi: 10.1371/journal.pone.0009615
![]() |
[65] |
S. Kommoss, B. Winterhoff, A. L. Oberg, G. E. Konecny, C. Wang, S. M. Riska, et al., Bevacizumab may differentially improve ovarian cancer outcome in patients with proliferative and mesenchymal molecular subtypes, Clin. Cancer Res., 23 (2017), 3794–3801. https://doi.org/10.1158/1078-0432.CCR-16-2196 doi: 10.1158/1078-0432.CCR-16-2196
![]() |
[66] |
S. D. McCabe, D. Y. Lin, M. I. Love, Consistency and overfitting of multi-omics methods on experimental data, Briefings Bioinf., 21 (2020), 1277–1284. https://doi.org/10.1093/bib/bbz070 doi: 10.1093/bib/bbz070
![]() |
[67] |
J. Yeomans, S. Thwaites, W. S. P. Robertson, D. Booth, B. Ng, D. Thewlis, Simulating time-series data for improved deep neural network performance, IEEE Access, 7 (2019), 131248–131255. https://doi.org/10.1109/access.2019.2940701 doi: 10.1109/ACCESS.2019.2940701
![]() |
[68] |
S. L. Ma, N. L. S. Tang, C. W. C. Tam, V. W. C. Lui, E. S. S. Lau, Y. P. Zhang, Polymorphisms of the estrogen receptor α (ESR1) gene and the risk of Alzheimer's disease in a southern Chinese community, Int. Psychogeriatrics, 21 (2009), 977–986. https://doi.org/10.1017/s1041610209990068 doi: 10.1017/S1041610209990068
![]() |
[69] |
H. Bronger, J. Singer, C. Windmuller, U. Reuning, D. Zech, C. Delbridge, et al., CXCL9 and CXCL10 predict survival and are regulated by cyclooxygenase inhibition in advanced serous ovarian cancer, Br. J. Cancer, 115 (2016), 553–563. https://doi.org/10.1038/bjc.2016.172 doi: 10.1038/bjc.2016.172
![]() |
[70] |
K. M. Gharpure, O. D. Lara, Y. Wen, S. Pradeep, C. LaFargue, C. Ivan, et al., ADH1B promotes mesothelial clearance and ovarian cancer infiltration, Oncotarget, 9 (2018), 25115. https://doi.org/10.18632/oncotarget.25344 doi: 10.18632/oncotarget.25344
![]() |
[71] |
X. Li, L. Zhao, T. Meng, Upregulated CXCL14 is associated with poor survival outcomes and promotes ovarian cancer cells proliferation, Cell Biochem. Funct., 38 (2020), 613–620. https://doi.org/10.1002/cbf.3516 doi: 10.1002/cbf.3516
![]() |
[72] |
X. Li, Y. Shi, Z. Yin, X. Xue, B. Zhou, An eight-miRNA signature as a potential biomarker for predicting survival in lung adenocarcinoma, J. Transl. Med., 12 (2014), 1–12. https://doi.org/10.1186/1479-5876-12-159 doi: 10.1186/1479-5876-12-159
![]() |
[73] |
P. K. Croft, S. Sharma, N. Godbole, G. E. Rice, C. Salomon, Ovarian-cancer-associated extracellular vesicles: Microenvironmental regulation and potential clinical applications, Cells, 10 (2021), 2272. https://doi.org/10.3390/cells10092272 doi: 10.3390/cells10092272
![]() |
[74] |
Q. J. Wu, M. Guo, Z. M. Lu, T. Li, H. Z. Qiao, Y. Ke, Detection of human papillomavirus-16 in ovarian malignancy, Br. J. Cancer, 89 (2003), 672–675. https://doi.org/10.1038/sj.bjc.6601172 doi: 10.1038/sj.bjc.6601172
![]() |
[75] |
K. L. Clark, J. W. George, E. Przygrodzka, M. R. Plewes, G. Hua, C. Wang, et al., Hippo signaling in the ovary: Emerging roles in development, fertility, and disease, Endocr. Rev., 43 (2022), 1074–1096. https://doi.org/10.1210/endrev/bnac013 doi: 10.1210/endrev/bnac013
![]() |
1. | Mohamed El-Khatib, Dan Popescu, Oana Mihaela Teodor, Loretta Ichim, New Trends in Ovarian Cancer Diagnosis Using Deep Learning: A Systematic Review, 2024, 12, 2169-3536, 116587, 10.1109/ACCESS.2024.3434722 | |
2. | Aditya Raj, Ruben C. Petreaca, Golrokh Mirzaei, Multi-Omics Integration for Liver Cancer Using Regression Analysis, 2024, 46, 1467-3045, 3551, 10.3390/cimb46040222 | |
3. | Huiqing Wang, Xiao Han, Shuaijun Niu, Hao Cheng, Jianxue Ren, Yimeng Duan, Guanghui Liu, DFASGCNS: A prognostic model for ovarian cancer prediction based on dual fusion channels and stacked graph convolution, 2024, 19, 1932-6203, e0315924, 10.1371/journal.pone.0315924 |
Omics type | Number of samples | Number of features | Summary |
mRNA | 367 | 46,610 | HTSeq-FPKM |
DNA methylation | 363 | 24,923 | Illumina Human Methylation 27k |
miRNA | 499 | 1874 | BCGSC Illumina HiSeq |
CNV | 606 | 24,740 | Affymetrix SNP Array 6.0 |
Omics type | Number of samples | Number of features | Summary |
mRNA | 325 | 8492 | HTSeq-FPKM |
DNA methylation | 325 | 6125 | Illumina Human Methylation 27k |
miRNA | 325 | 454 | BCGSC Illumina HiSeq |
CNV | 325 | 2274 | Affymetrix SNP Array 6.0 |
Omics type | Number of samples | Number of features |
mRNA | 325 | 143 |
DNA methylation | 325 | 142 |
miRNA | 325 | 128 |
CNV | 325 | 136 |
Method | ACC | F1-score | AUC |
KNN | 57.27 ± 2.31 | 47.74±1.89 | 52.26± 2.61 |
SVM | 54.28±1.87 | 53.59±1.66 | 54.43±2.25 |
RF | 55.16±3.49 | 54.17±2.36 | 53.12±2.11 |
XGBoost | 56.04±2.12 | 54.66±1.60 | 54.92±0.87 |
FNN | 59.21±1.69 | 65.84±2.56 | 53.62±2.14 |
MOGONET | 64.25±2.36 | 73.16±2.20 | 57.94±1.98 |
MOCSC | 65.48±1.83 | 73.45±2.31 | 59.25±2.17 |
MDCADON (ours) | 69.47±2.10 | 77.91±1.82 | 63.40± 2.15 |
Datasets | Sample numbers | Data category | Gene annotation platform |
GSE26712 | 185 | RNA-seq | GPL96 Affymetrix |
GSE32062 | 260 | Gene expression | GPL6480 Agilent |
GSE17260 | 110 | Gene expression | GPL6480 Agilent |
GSE140082 | 380 | Gene expression | GPL14951 Illumina |
Omics data type | Genes |
mRNA | NOL7, PPP3CA, PUF60, CXCL9, ZNF561, CXCL14, USP14, ADH1B, UTP11, PARL, PBK, NRGN, SCNN1A, POLD2, POLR1C, SAR1A, RAB12, NRAS, ZNF826P |
CNV | ACTR3 |
Omics type | Number of samples | Number of features | Summary |
mRNA | 367 | 46,610 | HTSeq-FPKM |
DNA methylation | 363 | 24,923 | Illumina Human Methylation 27k |
miRNA | 499 | 1874 | BCGSC Illumina HiSeq |
CNV | 606 | 24,740 | Affymetrix SNP Array 6.0 |
Omics type | Number of samples | Number of features | Summary |
mRNA | 325 | 8492 | HTSeq-FPKM |
DNA methylation | 325 | 6125 | Illumina Human Methylation 27k |
miRNA | 325 | 454 | BCGSC Illumina HiSeq |
CNV | 325 | 2274 | Affymetrix SNP Array 6.0 |
Omics type | Number of samples | Number of features |
mRNA | 325 | 143 |
DNA methylation | 325 | 142 |
miRNA | 325 | 128 |
CNV | 325 | 136 |
Method | ACC | F1-score | AUC |
KNN | 57.27 ± 2.31 | 47.74±1.89 | 52.26± 2.61 |
SVM | 54.28±1.87 | 53.59±1.66 | 54.43±2.25 |
RF | 55.16±3.49 | 54.17±2.36 | 53.12±2.11 |
XGBoost | 56.04±2.12 | 54.66±1.60 | 54.92±0.87 |
FNN | 59.21±1.69 | 65.84±2.56 | 53.62±2.14 |
MOGONET | 64.25±2.36 | 73.16±2.20 | 57.94±1.98 |
MOCSC | 65.48±1.83 | 73.45±2.31 | 59.25±2.17 |
MDCADON (ours) | 69.47±2.10 | 77.91±1.82 | 63.40± 2.15 |
Datasets | Sample numbers | Data category | Gene annotation platform |
GSE26712 | 185 | RNA-seq | GPL96 Affymetrix |
GSE32062 | 260 | Gene expression | GPL6480 Agilent |
GSE17260 | 110 | Gene expression | GPL6480 Agilent |
GSE140082 | 380 | Gene expression | GPL14951 Illumina |
Omics data type | Genes |
mRNA | NOL7, PPP3CA, PUF60, CXCL9, ZNF561, CXCL14, USP14, ADH1B, UTP11, PARL, PBK, NRGN, SCNN1A, POLD2, POLR1C, SAR1A, RAB12, NRAS, ZNF826P |
CNV | ACTR3 |