
In recent studies, the tumourigenesis and development of endometrial carcinoma (EC) have been correlated significantly with redox. We aimed to develop and validate a redox-related prognostic model of patients with EC to predict the prognosis and the efficacy of immunotherapy. We downloaded gene expression profiles and clinical information of patients with EC from the Cancer Genome Atlas (TCGA) and the Gene Ontology (GO) dataset. We identified two key differentially expressed redox genes (CYBA and SMPD3) by univariate Cox regression and utilised them to calculate the risk score of all samples. Based on the median of risk scores, we composed low-and high-risk groups and performed correlation analysis with immune cell infiltration and immune checkpoints. Finally, we constructed a nomogram of the prognostic model based on clinical factors and the risk score. We verified the predictive performance using receiver operating characteristic (ROC) and calibration curves. CYBA and SMPD3 were significantly related to the prognosis of patients with EC and used to construct a risk model. There were significant differences in survival, immune cell infiltration and immune checkpoints between the low-and high-risk groups. The nomogram developed with clinical indicators and the risk scores was effective in predicting the prognosis of patients with EC. In this study, a prognostic model constructed based on two redox-related genes (CYBA and SMPD3) were proved to be independent prognostic factors of EC and associated with tumour immune microenvironment. The redox signature genes have the potential to predict the prognosis and the immunotherapy efficacy of patients with EC.
Citation: Yan He, Nannan Cao, Yanan Tian, Xuelin Wang, Qiaohong Xiao, Xiaojuan Tang, Jiaolong Huang, Tingting Zhu, Chunhui Hu, Ying Zhang, Jie Deng, Han Yu, Peng Duan. Development and validation of two redox-related genes associated with prognosis and immune microenvironment in endometrial carcinoma[J]. Mathematical Biosciences and Engineering, 2023, 20(6): 10339-10357. doi: 10.3934/mbe.2023453
[1] | Pingping Sun, Yongbing Chen, Bo Liu, Yanxin Gao, Ye Han, Fei He, Jinchao Ji . DeepMRMP: A new predictor for multiple types of RNA modification sites using deep learning. Mathematical Biosciences and Engineering, 2019, 16(6): 6231-6241. doi: 10.3934/mbe.2019310 |
[2] | Honglei Wang, Wenliang Zeng, Xiaoling Huang, Zhaoyang Liu, Yanjing Sun, Lin Zhang . MTTLm6A: A multi-task transfer learning approach for base-resolution mRNA m6A site prediction based on an improved transformer. Mathematical Biosciences and Engineering, 2024, 21(1): 272-299. doi: 10.3934/mbe.2024013 |
[3] | Hasan Zulfiqar, Rida Sarwar Khan, Farwa Hassan, Kyle Hippe, Cassandra Hunt, Hui Ding, Xiao-Ming Song, Renzhi Cao . Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method. Mathematical Biosciences and Engineering, 2021, 18(4): 3348-3363. doi: 10.3934/mbe.2021167 |
[4] | Huili Yang, Wangren Qiu, Zi Liu . Anoikis-related mRNA-lncRNA and DNA methylation profiles for overall survival prediction in breast cancer patients. Mathematical Biosciences and Engineering, 2024, 21(1): 1590-1609. doi: 10.3934/mbe.2024069 |
[5] | Yong Ding, Jian-Hong Liu . The signature lncRNAs associated with the lung adenocarcinoma patients prognosis. Mathematical Biosciences and Engineering, 2020, 17(2): 1593-1603. doi: 10.3934/mbe.2020083 |
[6] | Xiangzheng Fu, Yifan Chen, Sha Tian . DlncRNALoc: A discrete wavelet transform-based model for predicting lncRNA subcellular localization. Mathematical Biosciences and Engineering, 2023, 20(12): 20648-20667. doi: 10.3934/mbe.2023913 |
[7] | Yunxiang Wang, Hong Zhang, Zhenchao Xu, Shouhua Zhang, Rui Guo . TransUFold: Unlocking the structural complexity of short and long RNA with pseudoknots. Mathematical Biosciences and Engineering, 2023, 20(11): 19320-19340. doi: 10.3934/mbe.2023854 |
[8] | Shuai Miao, Lijun Wang, Siyu Guan, Tianshu Gu, Hualing Wang, Wenfeng Shangguan, Weiding Wang, Yu Liu, Xue Liang . Integrated whole transcriptome analysis for the crucial regulators and functional pathways related to cardiac fibrosis in rats. Mathematical Biosciences and Engineering, 2023, 20(3): 5413-5429. doi: 10.3934/mbe.2023250 |
[9] | Xuesi Chen, Qijun Zhang, Qin Zhang . Predicting potential biomarkers and immune infiltration characteristics in heart failure. Mathematical Biosciences and Engineering, 2022, 19(9): 8671-8688. doi: 10.3934/mbe.2022402 |
[10] | Xiaoshan Qian, Lisha Xu, Xinmei Yuan . Soft-sensing modeling of mother liquor concentration in the evaporation process based on reduced robust least-squares support-vector machine. Mathematical Biosciences and Engineering, 2023, 20(11): 19941-19962. doi: 10.3934/mbe.2023883 |
In recent studies, the tumourigenesis and development of endometrial carcinoma (EC) have been correlated significantly with redox. We aimed to develop and validate a redox-related prognostic model of patients with EC to predict the prognosis and the efficacy of immunotherapy. We downloaded gene expression profiles and clinical information of patients with EC from the Cancer Genome Atlas (TCGA) and the Gene Ontology (GO) dataset. We identified two key differentially expressed redox genes (CYBA and SMPD3) by univariate Cox regression and utilised them to calculate the risk score of all samples. Based on the median of risk scores, we composed low-and high-risk groups and performed correlation analysis with immune cell infiltration and immune checkpoints. Finally, we constructed a nomogram of the prognostic model based on clinical factors and the risk score. We verified the predictive performance using receiver operating characteristic (ROC) and calibration curves. CYBA and SMPD3 were significantly related to the prognosis of patients with EC and used to construct a risk model. There were significant differences in survival, immune cell infiltration and immune checkpoints between the low-and high-risk groups. The nomogram developed with clinical indicators and the risk scores was effective in predicting the prognosis of patients with EC. In this study, a prognostic model constructed based on two redox-related genes (CYBA and SMPD3) were proved to be independent prognostic factors of EC and associated with tumour immune microenvironment. The redox signature genes have the potential to predict the prognosis and the immunotherapy efficacy of patients with EC.
Various chemical modifications, including cytosine modification, uridine isomerization, and adenosine methylation, have been found in cellular RNA [1] and have been linked to important biological and physiological functions in cells [2]. У modification is a common posttranscriptional RNA modification known as the fifth base in RNA [3]. It is commonly present in a variety of species, and research has revealed that tRNA and rRNA contain large amounts of it [4]. Numerous biological processes have shown У to be crucial, and distinct У modifications serve different purposes at various places [5,6,7]. Therefore, the discovery of У sites in RNA sequences is crucial for both fundamental and applied biological research.
Initially, researchers identified У modification sites based on biochemical experiments. At first, researchers used paper chromatography to find У modification sites in the RNA of yeast, which was achieved by using RNA decomposition enzymes to decompose RNA and electrophoresis to separate out column chromatography on the upper layer of paper [3,4,5,6,7,8]. Later researchers successively used high-performance liquid chromatography and mass spectrometry to detect modification sites [9]. With the growing interest in this field, researchers have proposed a variety of high-throughput sequencing technologies, including Ψ-seq [10,11], PseudoU-seq [12] and CeU-Seq [13], and successfully used them to detect У sites. However, the methods described above are reliant on time-consuming, expensive, and difficult biochemical experiments, which are susceptible to environmental factors, and the sequencing process becomes increasingly difficult as the sequence length increases. Therefore, robust, fast, and inexpensive calculation methods are needed to predict У sites in RNA sequences.
First, Panwar and colleagues proposed a tRNAmod model to predict У sites in tRNA [14]. Then, a web server (PPUS) based on support vector machine (SVM) was proposed by Li et al. to identify У sites in S.cerevisiae and H.sapiens [15]. The frequency composition of nucleotides and the pseudo K-tuple nucleotide composition (PseKNC) were merged for feature representation in the iRNA-PseU model that Chen et al. created [16]. Subsequently, He et al. developed the SVM model (PseUI) to identify У sites in H.sapiens, S.cerevisiae and M.musculus, which combined a variety of feature extraction techniques including position-specific dinucleotide propensity (PSDP) [17]. Later, utilizing convolutional neural networks, Tahir et al. created a predictor (iPseU-CNN) [18]. Extreme gradient boosting (XGboost) was used by Liu et al. to create a new model known as XG-PseU [19]. Lv et al. also proposed a method called RF-PseU, which utilizes the LGBM algorithm for feature selection while combining the random forest algorithm for classification [20]. Saad et al. proposed a convolutional neural network model MU-PseUDeep [21], which combines sequence and secondary structural features to predict У sites. Li et al. built the model Porpoise by utilizing multiple type features and inputting them into the stacked ensemble learning framework [22]. Although the aforementioned techniques have proven successful in correctly identifying У sites in RNA sequences, they might still use more work in comparison to high-performance predictors [23,24,25,26,27,28].
In this study, we build a У site identification model (iPseU-TWSVM) based on TWSVM, and Figure 1 depicts the model construction process. The model combines multiple feature representation methods, including Kmer, ENAC and EIIP. To obtain the best subset of features, the mRMR approach is utilized. The model is then evaluated using 10-fold cross-validation (10-CV) and independent testing (Ⅰ-testing). The average Ⅰ-testing accuracy of the iPseU-TWSVM is 3.4% higher than that of current advanced predictors, demonstrating the better generalization performance of our model. Therefore, iPseU-TWSVM may become an effective tool for У site identification.
In this work, we train and evaluate our models using datasets created by Chen et al. [29]. The steps of constructing the benchmark dataset were as follows: 1307 positive samples and 33,280 negative samples were obtained at first, and then the subset-balancing treatment was adopted to reduce the number of negative samples according to Euclidean distance. The obtained distance values were sorted in ascending order, and the first 1307 negative samples were selected to form the negative subset. The training datasets contained data from three species, namely, H.sapiens, S.cerevisiae and M.musculus. The H.sapiens training dataset included 495 positive samples and 495 negative samples; the S.cerevisiae dataset included 314 positive samples and 314 negative samples; and the M.musculus dataset included 944 samples, half of which were positive samples. There were just two species in the Ⅰ-testing datasets: H.sapiens and S.cerevisiae. Each of them included 200 samples, of which only half were positive and half were negative.
Different types of features reflect biological significance from different perspectives, including sequence composition and physicochemical properties. In this work, a variety of types of features are used to comprehensively consider the composition, distribution and physicochemical properties of nucleotides in the sequence from various aspects to further improve the prediction performance of subsequent work.
One effective technique for extracting RNA sequence characteristics is Kmer, which reflects the frequency of k adjacent nucleotides in the sequence. The frequencies of the k-neighboring nucleotides are used to generate the feature vector [30]. The method is provided by the web server Pse-in-One2.0 (http://bioinformatics.hitsz.edu.cn/Pse-in-One2.0/) [31].
The approach offers 22 different physicochemical properties to create the pseudo-dinucleotide composition [32,33,34]. It overwrites the local sequence order and the global sequence order information into the feature vector. The relevant features are expressed in this form:
Vector=(m1m2⋯m16m16+1⋯m16+λ)T | (1) |
with
mi={qi∑16j=1qj+α∑λk=1ρk(1≤i≤16)αρσ−16∑16j=1qj+α∑λk=1ρk(16+1≤i≤16+λ) | (2) |
where qi(i=1,2,⋯,16) represents the 16 dinucleotides' normalized frequency of occurrence; α(0≤α≤1) is the weight factor; and λ is the highest counted rank. ρk is the k-tier correlation factor.
ρk displays the relationship between the sequence orders of all neighboring dinucleotides along a specific RNA sequence, which can be written as
ρk=1l−k−1∑l−k−1j=1C(Rj,Rj+k)(k=1,2,⋯,λ;λ<l−1) | (3) |
where C(Rj,Rj+k) indicates the correlation function expressed as
C(Rj,Rj+k)=1σ∑σg=1[Pg(Rj)−Pg(Rj+k)]2 | (4) |
where parameter σ is the number of physicochemical properties studied; Pg(Rj) and Pg(Rj+k) are the related values of the gth property for the dinucleotides Rj at position j and Rj+k at position j+k.
The coding method reflects that each nucleotide in the sequence has different chemical structures and binding properties. The ring structures of the four RNA nucleotides (ACGU) differ from one another, hydrogen bond, and functional group. Based on these differences, they may be represented with a 3D coordinate [35].
The method incorporates data on each nucleotide's distribution in the RNA sequence as well as its frequency [35]. We can calculate the density di of an RNA sequence's ith prefix subsequence. It is defined as
di=1i∑ij=1f(xj),wheref(xj)={1,ifxj=xi0,otherwise | (5) |
where i is the length of the sliding string and xj represents the nucleotide at the jth position.
The EIIP values represent the energy of the delocalized electron in the nucleotide. The nucleotides in the DNA sequence have previously been denoted by the EIIP values of A, G, C and T [36]. In the RF-PseU method [20], each nucleotide in an RNA sequence was also coded by EIIP feature vectors.
Using a fixed length window, the approach was used to determine the nucleotide composition [20,35]. Afterward, RNA sequences were converted into equally long feature vectors. Sequence length and sliding window size are two factors that affect the dimension of ENAC coding.
E=(b1,b2,⋯,bn),wherebi=NiN,i∈{A,C,G,U} | (6) |
where N is the sliding window size and n is the coding dimension.
Binary profiles provide the position specific composition of nucleotides in RNA fragments [35,36]. A four-digit binary vector is used to encode each nucleotide. Dibinary profiles are different from binary profiles in that they are encoded for 16 dinucleotides, i.e., AA is denoted by (0, 0, 0, 0).
mRMR [37] is a commonly used feature selection method for compressing feature vector space. The goal of this technique is to identify a subset of features from the initial feature set that have the lowest correlation between features and the highest correlation with the output result. It considers the connection between features as well as the association between features and labels. The mechanism of feature selection is as follows.
The mutual information is used to find the feature subset S containing m features first, so that the m features found have the maximum correlation with the category c. The correlation between the feature subset S and the category c is defined by the average value of all mutual information between each feature and category as shown in (7).
maxD(S,c),D=1|S|∑xi∈SI(xi;c) | (7) |
where I(xi;c) is mutual information; S is a subset of features of length m; xi is the ith feature in S and c is category variable.
Then the features selected by the maximum correlation may be redundant, and (8) is used to eliminate the redundancy among m features.
minR(S),R=1|S|2∑xi,xj∈SI(xi;xj) | (8) |
The final feature subset S is obtained by combining the maximum correlation D with the minimum redundancy R.
mRMR=max[1|S|∑xi∈SI(xi;c)−1|S|2∑xi,xj∈SI(xi;xj)] | (9) |
Compared with other feature selection methods, the proposed algorithm considers the redundancy among features, further optimizes the feature subset, and solves the problem that the maximum dependency is difficult to achieve. However, only approximate optimal solutions can be obtained in practical applications.
Consider the binary classification issue using the training datasets
Dtrain={(u1,1),(u2,1),⋯,(um,1),(um+1,−1),(um+2,−1),⋯,(um+n,−1)}, | (10) |
where ui∈Rn,i=1,2,⋯,m+n.
Let T=(u1,u2,⋯,um)T∈Rm×n,F=(um+1,um+2,⋯,um+n)T∈Rn×n and l=m+n.
TWSVM [38] looks for a pair of nonparallel hyperplanes in the linear case.
w+u+b+=0andw−u+b−=0 | (11) |
where w+∈Rn,w−∈Rn,b+∈R,b−∈R by solving the following pair of QPPs:
minw+,b+,ξ−12(Tw++e+b+)T(Tw++e+b+)+c1eT−ξ−s.t.−(Fw++e−b+)+ξ−≥e−,ξ−≥0 | (12) |
and
minw−,b−,ξ+12(Fw−+e−b−)T(Fw−+e−b−)+c2eT+ξ+s.t.(Tw−+e+b−)+ξ+≥e+,ξ+≥0 | (13) |
where c1,c2 are the penalty parameters, e+,e− are all 1 vectors (e+,e−=[1⋯1]T) whose dimensions are the same as the number of positive and negative samples respectively, and ξ+,ξ− are slack vectors of appropriate dimension.
Minimizing the objective function means making a hyperplane as close as possible to one type of data, and the constraint requires that the distance between the hyperplane and the other type of data is at least greater than 1. Their corresponding Lagrange dual problems are
maxαeT−α−12αTJ(KTK)−1JTαs.t.0≤α≤c1e− | (14) |
and
maxγeT+γ−12γTK(JTJ)−1KTγs.t.0≤γ≤c2e+ | (15) |
where
K=[Te+]∈Rm×(n+1),J=[Fe−]∈Rn×(n+1). | (16) |
The solution to the primary problem can be acquired by addressing the dual problem, which can be obtained by
(wT+,b+)T=−(KTK)−1JTα, | (17) |
(wT−,b−)T=−(JTJ)−1KTγ. | (18) |
Therefore, an unknown point u∈Rn is predicted to the Class by
Class=argmins=−,+|wsu+bs|, | (19) |
where |⋅| is the perpendicular distance of point u from the planes wsu+bs=0,s=−,+.
This method not only divides a large quadratic programming problem into two small quadratic programming problems, which improves the training speed, but also is not very sensitive to noise.
Five indicators were widely used to assess how well the built models performed [39,40,41], accuracy (ACC), sensitivity (SN), specificity (SP), Matthew correlation coefficient (MCC), and integral area under the receiver operating characteristic curve (auROC), which were calculated using the following equations.
ACC=TP+TNTP+TN+FP+FN | (20) |
SN=TPFN+TP | (21) |
SP=TNTN+FP | (22) |
MCC=TN×TP−FN×FP√(TP+FP)×(TP+FN)×(TN+FN)×(TN+FP) | (23) |
where TP, TN, FP, and FN represent true positive, true negative, false positive and false negative, respectively.
We use 10-CV for comparison [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62]. The training datasets are equally divided into ten subsets. The remaining one subset is tested after the proposed model has been trained using nine subsets. After each subset is tested once, the procedure is repeated ten times, and the average results represent the final performance. Finally, Ⅰ-testing was used in the testing datasets to evaluate the training model.
Feature extraction affects the results of subsequent sequence classification. To obtain better performance, this paper studied seven different features, including Kmer, PC-PseDNC-General and ANF, EIIP, ENAC, NCP + NBP from the RF-PseU method [20]. These features were first used in the experiment separately, and then multiple features were selected for different combinations according to the test results to obtain better experimental results.
Table 1 lists the results of feature combination for the H_990 dataset using the TWSVM methods. The first six rows are the performance of single features, with Kmer, EIIP, ENAC and NCP + NBP returning the best results, which are roughly distributed in the range of 0.57–0.59. Since the test results of single features were lower than those of the RF-PseU predictor, we combined several features that perform well and used the mRMR method to select the best feature for model construction. For the H_990 dataset, the combined characteristics listed in Table 1 have four results, including Kmer + ENAC, Kmer + EIIP, Kmer + NCP + NBP and Kmer + PC-PseDNC-General + ANF + EIIP + ENAC + NCP + NBP. Using the TWSVM method, the result of combined features was usually approximately 1–3% higher than that of single features, with a maximum ACC value of 0.65 and the best feature combination being KMER + PC-PseDNC-Generel +ANF + EIIP + ENAC + NCP + NBP. Compared with the RF-PseU predictor, the accuracy was improved by 0.7%. We chose this feature combination for the 10-CV of the training set H_990 and applied it to the Ⅰ-testing of the testing set H_200.
Feature Subset | TWSVM | ||||
ACC | MCC | SN | SP | AUC | |
Kmer | 0.59 | 0.181 | 0.533 | 0.646 | 0.618 |
PC-PseDNC-General | 0.534 | 0.07 | 0.434 | 0.635 | 0.543 |
ANF | 0.526 | 0.053 | 0.568 | 0.485 | 0.518 |
EIIP | 0.572 | 0.144 | 0.525 | 0.618 | 0.6 |
ENAC | 0.587 | 0.178 | 0.472 | 0.7 | 0.59 |
NCP + NBP | 0.584 | 0.172 | 0.53 | 0.639 | 0.582 |
Kmer + NCP + NBP | 0.59 | 0.182 | 0.532 | 0.648 | 0.587 |
Kmer + ENAC | 0.603 | 0.208 | 0.582 | 0.624 | 0.62 |
Kmer + EIIP | 0.606 | 0.212 | 0.585 | 0.626 | 0.625 |
Kmer + PC-PseDNC-Generel + ANF + EIIP + ENAC + NCP + NBP |
0.65 | 0.301 | 0.697 | 0.602 | 0.682 |
Table 2 lists the test results of different feature combinations on the S_628 dataset using the TWSVM method. Similar to the results in Table 1, the test results of Kmer, EIIP, ENAC and NCP + NBP were better. We combined those features that reported better performance. Except for Kmer + NCP + NBP, the results of other combined features were improved compared with the single feature results. Among them, the performance of the feature combination Kmer + PC – PseDNC – Generel + ANF + EIIP + ENAC + NCP + NBP was the best, with test results improved by approximately 6% compared to other combinations. We also used the independent test set S_200 to test this feature combination.
Feature Subset | TWSVM | ||||
ACC | MCC | SN | SP | AUC | |
Kmer | 0.627 | 0.259 | 0.625 | 0.63 | 0.67 |
PC-PseDNC-General | 0.574 | 0.153 | 0.666 | 0.483 | 0.589 |
ANF | 0.584 | 0.175 | 0.716 | 0.452 | 0.584 |
EIIP | 0.614 | 0.238 | 0.473 | 0.755 | 0.632 |
ENAC | 0.634 | 0.326 | 0.435 | 0.836 | 0.693 |
NCP + NBP | 0.669 | 0.34 | 0.659 | 0.678 | 0.711 |
Kmer + NCP + NBP | 0.664 | 0.33 | 0.653 | 0.675 | 0.683 |
Kmer + ENAC | 0.653 | 0.308 | 0.631 | 0.675 | 0.683 |
Kmer + EIIP | 0.631 | 0.263 | 0.628 | 0.634 | 0.662 |
Kmer + PC-PseDNC-Generel+ANF + EIIP + ENAC + NCP + NBP | 0.722 | 0.45 | 0.656 | 0.786 | 0.758 |
Table 3 shows the test results for the dataset M_994. The feature combination Kmer + PC-PseDNC-Generel + ANF + EIIP + ENAC + NCP + NBP had the best performance, and all test indicators were higher than the rest of the feature combinations. The combination was increased by approximately 5%, the MCC was increased by approximately 10%, and the AUC was also significantly improved.
Feature Subset | TWSVM | ||||
ACC | MCC | SN | SP | AUC | |
Kmer | 0.584 | 0.17 | 0.676 | 0.49 | 0.614 |
PC-PseDNC-General | 0.541 | 0.084 | 0.517 | 0.566 | 0.539 |
ANF | 0.553 | 0.113 | 0.71 | 0.396 | 0.527 |
EIIP | 0.625 | 0.276 | 0.826 | 0.424 | 0.632 |
ENAC | 0.664 | 0.329 | 0.667 | 0.661 | 0.7 |
NCP + NBP | 0.662 | 0.326 | 0.636 | 0.688 | 0.703 |
Kmer + NCP + NBP | 0.677 | 0.36 | 0.623 | 0.731 | 0.73 |
Kmer + ENAC | 0.662 | 0.326 | 0.657 | 0.667 | 0.704 |
Kmer + EIIP | 0.636 | 0.274 | 0.697 | 0.574 | 0.667 |
Kmer + PC-PseDNC-Generel + ANF + EIIP + ENAC + NCP + NBP | 0.728 | 0.462 | 0.795 | 0.661 | 0.775 |
In this study, we contrast the mRMR approach and the LightGBM method [64] since feature selection is a crucial component of model construction. Figure 2 shows their accuracy on the training datasets of the three species. The findings demonstrate that the performance of the mRMR technique is superior, which further enhances the classification accuracy of the model. The accuracy of the mRMR method on the three species is greater than that of the LightGBM approach.
The accuracy of classification results may be successfully increased by feature selection. We initially utilized the mRMR technique to pick feature subsets with high correlation with class labels and low feature redundancy to obtain the optimum feature dimension. To further obtain the feature dimension with the best precision, the incremental feature selection approach was applied. After many experiments, we found that the accuracy of Ⅰ-testing and 10-CV fluctuates as the number of characteristics rises and the highest accuracy mostly appeared within 100 or 150 dimensions, as illustrated in Figure 3. The accuracy of each species initially increases rapidly as the feature dimension increases, and then fluctuates continuously. For H.sapiens species, the highest 10-CV accuracy of 0.65 was obtained when the feature dimension reached 33, while the highest independent test accuracy of 0.763 was obtained at relatively low dimensions. The highest 10-CV accuracy of 0.722 and independent test accuracy of 0.825 for S. cerevisiae species were between 60–80 dimensions, obtained in 72 and 76 dimensions, respectively. M. musculus species only showed 10-CV results with the highest value at a feature dimension of 62.
Tables 4–6 show the changes of feature dimensions after feature selection and the distribution of feature subsets after optimization for the three species. It can be found that ENAC and EIIP occupy a large number in the optimized feature subset of the three species, followed by NCP and NBP, ANF and Kmer occupy a small number, and there is no PC-PseDNC-General in the optimized feature subset. It indicates that each feature has different contributions in the model, and ENAC and EIIP play an important role in the model.
Feature | The original dimension | The dimension after feature selection |
NCP | 63 | 14 |
EIIP | 21 | 21 |
NBP | 84 | 15 |
ENAC | 105 | 41 |
ANF | 21 | 8 |
Kmer | 64 | 1 |
PC-PseDNC-general | 22 | 0 |
Feature | The Original Dimension | The dimension after feature selection |
NCP | 93 | 22 |
EIIP | 31 | 31 |
NBP | 124 | 25 |
ENAC | 155 | 53 |
ANF | 31 | 19 |
Kmer | 256 | 0 |
PC-PseDNC-general | 18 | 0 |
Feature | The original dimension | The dimension after feature selection |
NCP | 63 | 11 |
EIIP | 21 | 21 |
NBP | 84 | 17 |
ENAC | 105 | 43 |
ANF | 21 | 7 |
Kmer | 64 | 1 |
PC-PseDNC-general | 22 | 0 |
Since many previous researchers built У sites recognition models based on support vector machines, we employed SVM [65] as a classifier in the same feature space to compare the performance of TWSVM with that of SVM. Figure 4 displays how it performed. The ACC, MCC, and AUC based on the TWSVM model were found to be larger than those based on the SVM model for the 10-CV results of the three species, while the independent test results may have more clearly indicated the difference between the two. All of the evaluation metrics outperformed the SVM model. As a result, we concluded that the TWSVM model performs much better than the SVM model, suggesting that it may be better suited for identifying У sites in RNA sequences.
The effectiveness of iPseU-TWSVM was also evaluated in comparison to other advanced predictors, such as iRNA-PseU [16], PseUI [17], iPseU-CNN [18], XG-PseU [19] and RF-PseU [20]. The 10-CV and Ⅰ-testing results of the advanced У site predictors using iPseU-TWSVM are contrasted in Tables 7 and 8, respectively. The 10-CV results reveal that the accuracy of iPseU-TWSVM on H.sapiens is 1.7% less accurate than that of the best predictor iPseU-CNN on this species, and the accuracy on S.cerevisiae and M.musculus is 0.722 and 0.728, respectively, which is 2.6 and 2.0% less accurate than that of the best predictor RF-PseU. Although iPseU-TWSVM does not perform optimally on the training set, iPseU-TWSVM has higher accuracy than other predictors on all species in terms of Ⅰ-testing. H.sapiens and S.cerevisiae are 1.3 and 5.5% more accurate in Ⅰ-testing than the best predictor RF-PseU, with corresponding accuracy values of 0.763 and 0.825, respectively. We also calculated the average accuracy of several species so that we could compare the predictors' performance in depth. As shown in Table 9, the 10-CV accuracy of iPseU-TWSVM is 1.3% less than that of RF-PseU. In terms of Ⅰ-testing, iPseU-TWSVM is significantly improved by 3.4% compared with RF-PseU. iPseU-TWSVM performs much better overall than the other predictors. The findings demonstrate that iPseU-TWSVM, a very practical technique, has greater generalization performance and is more appropriate for recognizing У sites in RNA sequences.
Species | Classifier | Cross-validation | ||||
ACC | MCC | SN | SP | AUC | ||
H.sapiens | iRNA-PseU | 0.604 | 0.21 | 0.61 | 0.598 | 0.64 |
PseUI | 0.642 | 0.28 | 0.649 | 0.636 | 0.68 | |
iRNA-CNN | 0.667 | 0.34 | 0.65 | 0.688 | / | |
XG-PseU | 0.661 | 0.32 | 0.635 | 0.687 | 0.7 | |
RF-PseU | 0.643 | 0.29 | 0.661 | 0.626 | 0.7 | |
iPseU-TWSVM | 0.65 | 0.301 | 0.697 | 0.602 | 0.682 | |
S.cerevisiae | iRNA-PseU | 0.645 | 0.29 | 0.647 | 0.643 | 0.81 |
PseUI | 0.641 | 0.3 | 0.647 | 0.675 | 0.69 | |
iRNA-CNN | 0.682 | 0.37 | 0.664 | 0.705 | / | |
XG-PseU | 0.682 | 0.37 | 0.668 | 0.695 | 0.77 | |
RF-PseU | 0.748 | 0.49 | 0.772 | 0.724 | 0.81 | |
iPseU-TWSVM | 0.722 | 0.45 | 0.656 | 0.786 | 0.758 | |
M.musculus | iRNA-PseU | 0.691 | 0.38 | 0.733 | 0.648 | 0.75 |
PseUI | 0.704 | 0.41 | 0.799 | 0.703 | 0.71 | |
iRNA-CNN | 0.718 | 0.44 | 0.748 | 0.691 | / | |
XG-PseU | 0.72 | 0.45 | 0.765 | 0.676 | 0.74 | |
RF-PseU | 0.748 | 0.5 | 0.731 | 0.765 | 0.796 | |
iPseU-TWSVM | 0.728 | 0.462 | 0.795 | 0.661 | 0.775 |
Species | Classifier | Independent testing | ||||
ACC | MCC | SN | SP | AUC | ||
H.sapiens | iRNA-PseU | 0.65 | 0.3 | 0.6 | 0.7 | / |
PseUI | 0.655 | 0.31 | 0.63 | 0.7 | / | |
iRNA-CNN | 0.69 | 0.4 | 0.777 | 0.68 | / | |
XG-PseU | 0.675 | / | / | 0.608 | / | |
RF-PseU | 0.75 | 0.5 | 0.78 | 0.72 | 0.8 | |
iPseU-TWSVM | 0.763 | 0.529 | 0.825 | 0.7 | 0.786 | |
S.cerevisiae | iRNA-PseU | 0.6 | 0.2 | 0.63 | 0.57 | / |
PseUI | 0.685 | 0.37 | 0.65 | 0.72 | / | |
iRNA-CNN | 0.735 | 0.47 | 0.688 | 0.778 | / | |
XG-PseU | 0.71 | / | / | / | / | |
RF-PseU | 0.77 | 0.54 | 0.75 | 0.79 | 0.838 | |
iPseU-TWSVM | 0.825 | 0.65 | 0.85 | 0.8 | 0.905 |
Scores type | iPseU-TWSVM | RF-PseU | XG-PseU | iRNA-CNN | PseUI | iRNA-PseU |
Cross-validation | 0.7 | 0.713 | 0.687 | 0.689 | 0.662 | 0.647 |
Independent testing | 0.794 | 0.76 | 0.693 | 0.713 | 0.7 | 0.625 |
This work proposes the use of a novel model called iPseU-TWSVM to identify RNA У sites across various species. We have used an efficient feature selection method to obtain the best feature subset and selected TWSVM as the classifier to increase recognition accuracy. Finally, we compared advanced predictors and found that iPseU-TWSVM significantly improved the independent test accuracy by 3.4%, while the accuracy of cross validation was lower by 1.3%. Through comprehensive analysis, it was concluded that the relatively poor performance of the training datasets was due to the following two reasons. One is that the features used by the best predictor are different, and the other is that the classifier of the model is different. The above results indicate that iPseU-TWSVM had better generalization performance and could more accurately identify У sites from RNA sequences. It is anticipated that iPseU-TWSVM will be effective in identifying RNA У sites.
The contribution of this work has the following three aspects: (ⅰ) the model uses TWSVM as a classifier, which improves the accuracy of the model and improves the training speed; (ⅱ) the model has good generalization performance and can be applied to the prediction of other sites in the sequence; (ⅲ) further accurate identification of У sites in the sequence lays the foundation for disease control and related drug development. At the same time, this work also has the following shortcomings: (ⅰ) in the feature selection part, only two algorithms are compared, and subsequent research can try other algorithms to further improve the feature subset; (ⅱ) the model uses TWSVM as a classifier. In the original problem of TWSVM, only empirical risk is minimized, but structural risk is not minimized. Moreover, the algorithm can only obtain approximate solutions. Subsequent research can consider improving TWSVM or try other classification algorithms as the classifier of the model to improve the prediction performance. Future work will study emerging methods [66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81] to further improve the accuracy of the model.
This paper is supported by the National Natural Science Foundation of China (NSFC 62172076, 62072385), and the Municipal Government of Quzhou under Grant Number 2020D003 and 2021D004.
The authors declare there is no conflict of interest.
[1] | W. Chen, R. Zheng, P. D. Baade, S. Zhang, H. Zeng, F. Bray, et al., Cancer statistics in China, 2015, CA: Cancer J. Clin., 66 (2016), 115–132. https://doi.org/10.3322/caac.21338 |
[2] |
R. L. Siegel, K. D. Miller, H. E. Fuchs, A. Jemal, Cancer statistics, 2022, CA: Cancer J. Clin., 71 (2021), 7–33. https://doi.org/10.3322/caac.21708 doi: 10.3322/caac.21708
![]() |
[3] |
M. E. Urick, D. W. Bell, Clinical actionability of molecular targets in endometrial cancer, Nat. Rev. Cancer, 19 (2019), 510–521. https://doi.org/10.1038/s41568-019-0177-x doi: 10.1038/s41568-019-0177-x
![]() |
[4] |
L. Mutlu, J. Harold, J. Tymon-Rosario, A. D. Santin, Immune checkpoint inhibitors for recurrent endometrial cancer, Expert Rev. Anticancer Ther., 22 (2022), 249–258. https://doi.org/10.1080/14737140.2022.2044311 doi: 10.1080/14737140.2022.2044311
![]() |
[5] |
J. Ventriglia, I. Paciolla, C. Pisano, S. C. Cecere, M. Di Napoli, R. Tambaro, et al., Immunotherapy in ovarian, endometrial and cervical cancer: State of the art and future perspectives, Cancer Treat. Rev., 59 (2017), 109–116. https://doi.org/10.1016/j.ctrv.2017.07.008 doi: 10.1016/j.ctrv.2017.07.008
![]() |
[6] |
D. Xian, J. Song, L. Yang, X. Xiong, R. Lai, J. Zhong, Emerging roles of redox-mediated angiogenesis and oxidative stress in dermatoses, Oxid. Med. Cell. Longevity, 2019 (2019), 2304018. https://doi.org/10.1155/2019/2304018 doi: 10.1155/2019/2304018
![]() |
[7] |
H. Lan, Y. Gao, Z. Zhao, Z. Mei, F. Wang, Ferroptosis: Redox imbalance and hematological tumorigenesis, Front. Oncol., 12 (2022), 834681. https://doi.org/10.3389/fonc.2022.834681 doi: 10.3389/fonc.2022.834681
![]() |
[8] |
R. Camarda, A. Y. Zhou, R. A. Kohnz, S. Balakrishnan, C. Mahieu, B. Anderton, et al., Inhibition of fatty acid oxidation as a therapy for MYC-overexpressing triple-negative breast cancer, Nat. Med., 22 (2016), 427–432. https://doi.org/10.1038/nm.4055 doi: 10.1038/nm.4055
![]() |
[9] |
T. Poplawski, D. Pytel, J. Dziadek, I. Majsterek, Interplay between redox signaling, oxidative stress, and unfolded protein response (UPR) in pathogenesis of human diseases, Oxid. Med. Cell. Longevity, 2019 (2019), 6949347. https://doi.org/10.1155/2019/6949347 doi: 10.1155/2019/6949347
![]() |
[10] |
S. E. Eriksson, S. Ceder, V. J. N. Bykov, K. G. Wiman, p53 as a hub in cellular redox regulation and therapeutic target in cancer, J. Mol. Cell Biol., 11 (2019), 330–341. https://doi.org/10.1093/jmcb/mjz005 doi: 10.1093/jmcb/mjz005
![]() |
[11] |
S. K. Joseph, D. M. Booth, M. P. Young, G. Hajnóczky, Redox regulation of ER and mitochondrial Ca2+ signaling in cell survival and death, Cell Calcium, 79 (2019), 89–97. https://doi.org/10.1016/j.ceca.2019.02.006 doi: 10.1016/j.ceca.2019.02.006
![]() |
[12] |
E. Balta, J. Kramer, Y. Samstag, Redox regulation of the actin cytoskeleton in cell migration and adhesion: on the way to a spatiotemporal view, Front. Cell Dev. Biol., 8 (2020), 618261. https://doi.org/10.3389/fcell.2020.618261 doi: 10.3389/fcell.2020.618261
![]() |
[13] |
J. Pravda, Systemic lupus erythematosus: Pathogenesis at the functional limit of redox homeostasis, Oxid. Med. Cell. Longevity, 2019 (2019), 1651724. https://doi.org/10.1155/2019/1651724 doi: 10.1155/2019/1651724
![]() |
[14] |
K. Mattes, E. Vellenga, H. Schepers, Differential redox-regulation and mitochondrial dynamics in normal and leukemic hematopoietic stem cells: A potential window for leukemia therapy, Crit. Rev. Oncol. Hematol., 144 (2019), 102814. https://doi.org/10.1016/j.critrevonc.2019.102814 doi: 10.1016/j.critrevonc.2019.102814
![]() |
[15] |
A. Cruz-Gregorio, A. K. Aranda-Rivera, J. Pedraza-Chaverri, J. D. Solano, M. E. Ibarra-Rubio, Redox-sensitive signaling pathways in renal cell carcinoma, BioFactors, 48 (2022), 342–358. https://doi.org/10.1002/biof.1784 doi: 10.1002/biof.1784
![]() |
[16] |
Q. Xia, X. Yang, J. L. Lu, C. Q. Liu, J. X. Sun, C. Li, et al., Development and validation of a nine-redox-related long noncoding RNA signature in renal clear cell carcinoma, Oxid. Med. Cell. Longevity, 2020 (2020), 6634247. https://doi.org/10.1155/2020/6634247 doi: 10.1155/2020/6634247
![]() |
[17] |
J. Ren, A. Wang, J. Liu, Q. Yuan, Identification and validation of a novel redox-related lncRNA prognostic signature in lung adenocarcinoma, Bioengineered, 12 (2021), 4331–4348. https://doi.org/10.1080/21655979.2021.1951522 doi: 10.1080/21655979.2021.1951522
![]() |
[18] |
K. Tu, J. Li, H. Mo, Y. Xian, Q. Xu, X. Xiao, Identification and validation of redox-immune based prognostic signature for hepatocellular carcinoma, Int. J. Med. Sci., 18 (2021), 2030–2041. https://doi.org/10.7150/ijms.56289 doi: 10.7150/ijms.56289
![]() |
[19] |
Y. Wu, X. Wei, H. Feng, B. Hu, B. Liu, Y. Luan, et al., Integrated analysis to identify a redox-related prognostic signature for clear cell renal cell carcinoma, Oxid. Med. Cell. Longevity, 2021 (2021), 6648093. https://doi.org/10.1155/2021/6648093 doi: 10.1155/2021/6648093
![]() |
[20] |
Y. Y. Zhang, Z. J. Ni, E. Elam, F. Zhang, K. Thakur, S. Wang, et al., Juglone, a novel activator of ferroptosis, induces cell death in endometrial carcinoma Ishikawa cells, Food Funct., 12 (2021), 4947–4959. https://doi.org/10.1039/D1FO00790D doi: 10.1039/D1FO00790D
![]() |
[21] |
S. Hä nzelmann, R. Castelo, J. Guinney, GSVA: gene set variation analysis for microarray and RNA-seq data, BMC Bioinf., 14 (2013), 7. https://doi.org/10.1186/1471-2105-14-7 doi: 10.1186/1471-2105-14-7
![]() |
[22] |
D. Aran, Z. Hu, A. J. Butte, xCell: digitally portraying the tissue cellular heterogeneity landscape, Genome Biol., 18 (2017), 220. https://doi.org/10.1186/s13059-017-1349-1 doi: 10.1186/s13059-017-1349-1
![]() |
[23] |
H. Sung, J. Ferlay, R. L. Siegel, M. Laversanne, I. Soerjomataram, A. Jemal, et al., Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA: Cancer J. Clin., 71 (2021), 209–249. https://doi.org/10.3322/caac.21660 doi: 10.3322/caac.21660
![]() |
[24] |
Y. Cai, B. Wang, W. Xu, K. Liu, Y. Gao, C. Guo, et al., Endometrial cancer: Genetic, metabolic characteristics, therapeutic strategies and nanomedicine, Curr. Med. Chem., 28 (2021), 8755–8781. https://doi.org/10.2174/0929867328666210705144456 doi: 10.2174/0929867328666210705144456
![]() |
[25] |
P. A. Ott, Y. J. Bang, D. Berton-Rigaud, E. Elez, M. J. Pishvaian, H. S. Rugo, et al., Safety and antitumor activity of pembrolizumab in advanced programmed death ligand 1-positive endometrial cancer: Results from the KEYNOTE-028 study, J. Clin. Oncol., 35 (2017), 2535–2541. https://doi.org/10.1200/JCO.2017.72.5952 doi: 10.1200/JCO.2017.72.5952
![]() |
[26] |
S. B. Crist, T. Nemkov, R. F. Dumpit, J. Dai, S. J. Tapscott, L. D. True, et al., Unchecked oxidative stress in skeletal muscle prevents outgrowth of disseminated tumour cells, Nat. Cell Biol., 24 (2022), 538–553. https://doi.org/10.1038/s41556-022-00881-4 doi: 10.1038/s41556-022-00881-4
![]() |
[27] |
B. Jiang, J. Zhang, G. Zhao, M. Liu, J. Hu, F. Lin, et al., Filamentous GLS1 promotes ROS-induced apoptosis upon glutamine deprivation via insufficient asparagine synthesis, Mol. Cell, 82 (2022), 1821–1835.e6. https://doi.org/10.1016/j.molcel.2022.03.016 doi: 10.1016/j.molcel.2022.03.016
![]() |
[28] |
D. G. Franchina, H. Kurniawan, M. Grusdat, C. Binsfeld, L. Guerra, L. Bonetti, et al., Glutathione-dependent redox balance characterizes the distinct metabolic properties of follicular and marginal zone B cells, Nat. Commun., 13 (2022), 1789. https://doi.org/10.1038/s41467-022-29426-x doi: 10.1038/s41467-022-29426-x
![]() |
[29] |
D. W. Killilea, A. N. Killilea, Mineral requirements for mitochondrial function: A connection to redox balance and cellular differentiation, Free Radical Biol. Med., 182 (2022), 182–191. https://doi.org/10.1016/j.freeradbiomed.2022.02.022 doi: 10.1016/j.freeradbiomed.2022.02.022
![]() |
[30] |
H. Shyam, N. Singh, S. Kaushik, R. Sharma, A. K. Balapure, Centchroman induces redox-dependent apoptosis and cell-cycle arrest in human endometrial cancer cells, Apoptosis, 22 (2017), 570–584. https://doi.org/10.1007/s10495-017-1346-6 doi: 10.1007/s10495-017-1346-6
![]() |
[31] |
F. Heidari, S. Rabizadeh, M. A. Mansournia, H. Mirmiranpoor, S. S. Salehi, S. Akhavan, et al., Inflammatory, oxidative stress and anti-oxidative markers in patients with endometrial carcinoma and diabetes, Cytokine, 120 (2019), 186–190. https://doi.org/10.1016/j.cyto.2019.05.007 doi: 10.1016/j.cyto.2019.05.007
![]() |
[32] |
Q. Chen, X. Zhong, X. Li, J. Wang, Research advances on the pathogenesis of endometrial serous carcinoma, Chin. J. Obstet. Gynecol., 2 (2020), 142–144. https://doi.org/10.3760/cma.j.issn.0529-567X.2020.02.017 doi: 10.3760/cma.j.issn.0529-567X.2020.02.017
![]() |
[33] |
M. C. Ochoa, C. Razquin, G. Zalba, M. A. Martínez-González, J. A. Martínez, A. Marti, G allele of the -930A > G polymorphism of the CYBA gene is associated with insulin resistance in obese subjects, J. Physiol. Biochem., 64 (2008), 127–133. https://doi.org/10.1007/bf03168240 doi: 10.1007/bf03168240
![]() |
[34] |
A. H. Janneh, B. Ogretmen, Targeting sphingolipid metabolism as a therapeutic strategy in cancer treatment, Cancers, 14 (2022), 2183. https://doi.org/10.3390/cancers14092183 doi: 10.3390/cancers14092183
![]() |
[35] |
E. Tarazona-Santos, M. Machado, W. C. Magalhães, R. Chen, F. Lyon, L. Burdett, et al., Evolutionary dynamics of the human NADPH oxidase genes CYBB, CYBA, NCF2, and NCF4: functional implications, Mol. Biol. Evol., 30 (2013), 2157–2167. https://doi.org/10.1093/molbev/mst119 doi: 10.1093/molbev/mst119
![]() |
[36] |
L. Zhu, B. Miao, D. Dymerska, M. Kuswik, E. Bueno-Martínez, L. Sanoguera-Miralles, et al., Germline variants of CYBA and TRPM4 predispose to familial colorectal cancer, Cancers, 14 (2022), 670. https://doi.org/10.3390/cancers14030670 doi: 10.3390/cancers14030670
![]() |
[37] |
R. Paolillo, M. Boulanger, P. Gâtel, L. Gabellier, M. De Toledo, D. Tempé, et al., The NADPH oxidase NOX2 is a marker of adverse prognosis involved in chemoresistance of acute myeloid leukemias, Haematologica, 107 (2022). https://doi.org/10.3324/haematol.2021.279889 doi: 10.3324/haematol.2021.279889
![]() |
[38] |
M. Rose, T. Cardon, S. Aboulouard, N. Hajjaji, F. Kobeissy, M. Duhamel, et al., Surfaceome proteomic of glioblastoma revealed potential targets for immunotherapy, Front. Immunol., 12 (2021), 746168. https://doi.org/10.3389/fimmu.2021.746168 doi: 10.3389/fimmu.2021.746168
![]() |
[39] |
J. Wang, J. Li, J. Gu, J. Yu, S. Guo, Y. Zhu, et al., Abnormal methylation status of FBXW10 and SMPD3, and associations with clinical characteristics in clear cell renal cell carcinoma, Oncol. Lett., 10 (2015), 3073–3080. https://doi.org/10.3892/ol.2015.3707 doi: 10.3892/ol.2015.3707
![]() |
[40] |
A. Montfort, F. Bertrand, J. Rochotte, J. Gilhodes, T. Filleron, J. Milhès, et al., Neutral sphingomyelinase 2 heightens anti-melanoma immune responses and Anti-PD-1 therapy efficacy, Cancer Immunol. Res., 9 (2021), 568–582. https://doi.org/10.1158/2326-6066.CIR-20-0342 doi: 10.1158/2326-6066.CIR-20-0342
![]() |
[41] |
K. Revill, T. Wang, A. Lachenmayer, K. Kojima, A. Harrington, J. Li, et al., Genome-wide methylation analysis and epigenetic unmasking identify tumor suppressor genes in hepatocellular carcinoma, Gastroenterology, 145 (2013), 1424–1435. https://doi.org/10.1053/j.gastro.2013.08.055 doi: 10.1053/j.gastro.2013.08.055
![]() |
[42] |
X. Liu, J. Wu, D. Zhang, Z. Bing, J. Tian, M. Ni, et al., Identification of potential key genes associated with the pathogenesis and prognosis of gastric cancer based on integrated bioinformatics analysis, Front. Genet., 9 (2018), 265. https://doi.org/10.3389/fgene.2018.00265 doi: 10.3389/fgene.2018.00265
![]() |
[43] |
Y. H. Lee, C. W. Tan, A. Venkatratnam, C. S. Tan, L. Cui, S. F. Loh, et al., Dysregulated sphingolipid metabolism in endometriosis, J. Clin. Endocrinol. Metab., 99 (2014), E1913–1921. https://doi.org/10.1210/jc.2014-1340 doi: 10.1210/jc.2014-1340
![]() |
[44] |
C. Zhang, Z. Li, F. Qi, X. Hu, J. Luo, Exploration of the relationships between tumor mutation burden with immune infiltrates in clear cell renal cell carcinoma, Ann. Transl. Med., 7 (2019), 648. https://doi.org/10.21037/atm.2019.10.84 doi: 10.21037/atm.2019.10.84
![]() |
[45] |
J. Lu, P. Wilfred, D. Korbie, M. Trau, Regulation of canonical oncogenic signaling pathways in cancer via DNA methylation, Cancers, 12 (2020), 3199. https://doi.org/10.3390/cancers12113199 doi: 10.3390/cancers12113199
![]() |
[46] |
Y. Shen, M. Takahashi, H. M. Byun, A. Link, N. Sharma, F. Balaguer, et al., Boswellic acid induces epigenetic alterations by modulating DNA methylation in colorectal cancer cells, Cancer Biol. Ther., 13 (2012), 542–552. https://doi.org/10.4161/cbt.19604 doi: 10.4161/cbt.19604
![]() |
[47] |
K. Revill, T. Wang, A. Lachenmayer, K. Kojima, A. Harrington, J. Li, et al., Genome-wide methylation analysis and epigenetic unmasking identify tumor suppressor genes in hepatocellular carcinoma, Gastroenterology, 145 (2013), 1424–1435.e25. https://doi.org/10.1053/j.gastro.2013.08.055 doi: 10.1053/j.gastro.2013.08.055
![]() |
[48] |
Q. Song, X. Zhu, L. Jin, M. Chen, W. Zhang, J. Su, SMGR: a joint statistical method for integrative analysis of single-cell multi-omics data, NAR Genomics Bioinf., 4 (2022), lqac056. https://doi.org/10.1093/nargab/lqac056 doi: 10.1093/nargab/lqac056
![]() |
[49] |
Z. Tang, T. Zhang, B. Yang, J. Su, Q. Song, spaCI: deciphering spatial cellular communications through adaptive graph model, Briefings Bioinf., 24 (2023), bbac563.50. https://doi.org/10.1093/bib/bbac563 doi: 10.1093/bib/bbac563
![]() |
[50] |
M. Zheng, Y. Hu, R. Gou, S. Li, X. Nie, X. Li, et al., Development of a seven-gene tumor immune microenvironment prognostic signature for high-risk grade III endometrial cancer, Mol. Ther. Oncolytics, 22 (2021), 294–306. https://doi.org/10.1016/j.omto.2021.07.002 doi: 10.1016/j.omto.2021.07.002
![]() |
[51] |
Y. Fan, X. Li, L. Tian, J. Wang, Identification of a metabolism-related signature for the prediction of survival in endometrial cancer patients, Front. Oncol., 11 (2021), 630905. https://doi.org/10.3389/fonc.2021.630905 doi: 10.3389/fonc.2021.630905
![]() |
[52] |
S. Singh, X. H. F. Zhang, J. M. Rosen, TIME is a great healer-targeting myeloid cells in the tumor immune microenvironment to improve triple-negative breast cancer outcomes, Cells, 10 (2020), 11. https://doi.org/10.3390/cells10010011 doi: 10.3390/cells10010011
![]() |
[53] |
I. Mito, H. Takahashi, R. Kawabata-Iwakawa, S. Ida, H. Tada, K. Chikamatsu, Comprehensive analysis of immune cell enrichment in the tumor microenvironment of head and neck squamous cell carcinoma, Sci. Rep., 11 (2021), 16134. https://doi.org/10.1038/s41598-021-95718-9 doi: 10.1038/s41598-021-95718-9
![]() |
[54] |
Z. Abdulrahman, S. J. Santegoets, G. Sturm, P. Charoentong, M. E. Ijsselsteijn, A. Somarakis, et al., Tumor-specific T cells support chemokine-driven spatial organization of intratumoral immune microaggregates needed for long survival, J. ImmunoTher. Cancer, 10 (2022), e004346. http://dx.doi.org/10.1136/jitc-2021-004346 doi: 10.1136/jitc-2021-004346
![]() |
[55] |
C. F. Friedman, J. D. Hainsworth, R. Kurzrock, D. R. Spigel, H. A. Burris, C. J. Sweeney, et al., Atezolizumab treatment of tumors with high tumor mutational burden from mypathway, a multicenter, open-label, phase IIa multiple basket study, Cancer Discov., 12 (2022), 654–669. https://doi.org/10.1158/2159-8290.CD-21-0450 doi: 10.1158/2159-8290.CD-21-0450
![]() |
[56] |
M. J. Riggs, N. Lin, C. Wang, D. W. Piecoro, R. W. Miller, O. A. Hampton, et al., DACH1 mutation frequency in endometrial cancer is associated with high tumor mutation burden, PLoS One, 15 (2020), e0244558. https://doi.org/10.1371/journal.pone.0244558 doi: 10.1371/journal.pone.0244558
![]() |
[57] |
Y. Zhang, J. Zhang, Z. Shao, L. Zhao, Y. Zhang, S. Zhang, et al., Mutational landscapes and tumour mutational burden expression in endometrial cancer, Ann. Onco., 30 (2019), v424–v425. https://doi.org/10.1093/annonc/mdz250.048 doi: 10.1093/annonc/mdz250.048
![]() |
[58] |
M. Collin, Immune checkpoint inhibitors: a patent review (2010–2015), Expert Opin. Ther. Pat., 26 (2016), 555–564. https://doi.org/10.1080/13543776.2016.1176150 doi: 10.1080/13543776.2016.1176150
![]() |
![]() |
![]() |
1. | Mingzhao Wang, Haider Ali, Yandi Xu, Juanying Xie, Shengquan Xu, BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities, 2024, 300, 00219258, 107140, 10.1016/j.jbc.2024.107140 |
Feature Subset | TWSVM | ||||
ACC | MCC | SN | SP | AUC | |
Kmer | 0.59 | 0.181 | 0.533 | 0.646 | 0.618 |
PC-PseDNC-General | 0.534 | 0.07 | 0.434 | 0.635 | 0.543 |
ANF | 0.526 | 0.053 | 0.568 | 0.485 | 0.518 |
EIIP | 0.572 | 0.144 | 0.525 | 0.618 | 0.6 |
ENAC | 0.587 | 0.178 | 0.472 | 0.7 | 0.59 |
NCP + NBP | 0.584 | 0.172 | 0.53 | 0.639 | 0.582 |
Kmer + NCP + NBP | 0.59 | 0.182 | 0.532 | 0.648 | 0.587 |
Kmer + ENAC | 0.603 | 0.208 | 0.582 | 0.624 | 0.62 |
Kmer + EIIP | 0.606 | 0.212 | 0.585 | 0.626 | 0.625 |
Kmer + PC-PseDNC-Generel + ANF + EIIP + ENAC + NCP + NBP |
0.65 | 0.301 | 0.697 | 0.602 | 0.682 |
Feature Subset | TWSVM | ||||
ACC | MCC | SN | SP | AUC | |
Kmer | 0.627 | 0.259 | 0.625 | 0.63 | 0.67 |
PC-PseDNC-General | 0.574 | 0.153 | 0.666 | 0.483 | 0.589 |
ANF | 0.584 | 0.175 | 0.716 | 0.452 | 0.584 |
EIIP | 0.614 | 0.238 | 0.473 | 0.755 | 0.632 |
ENAC | 0.634 | 0.326 | 0.435 | 0.836 | 0.693 |
NCP + NBP | 0.669 | 0.34 | 0.659 | 0.678 | 0.711 |
Kmer + NCP + NBP | 0.664 | 0.33 | 0.653 | 0.675 | 0.683 |
Kmer + ENAC | 0.653 | 0.308 | 0.631 | 0.675 | 0.683 |
Kmer + EIIP | 0.631 | 0.263 | 0.628 | 0.634 | 0.662 |
Kmer + PC-PseDNC-Generel+ANF + EIIP + ENAC + NCP + NBP | 0.722 | 0.45 | 0.656 | 0.786 | 0.758 |
Feature Subset | TWSVM | ||||
ACC | MCC | SN | SP | AUC | |
Kmer | 0.584 | 0.17 | 0.676 | 0.49 | 0.614 |
PC-PseDNC-General | 0.541 | 0.084 | 0.517 | 0.566 | 0.539 |
ANF | 0.553 | 0.113 | 0.71 | 0.396 | 0.527 |
EIIP | 0.625 | 0.276 | 0.826 | 0.424 | 0.632 |
ENAC | 0.664 | 0.329 | 0.667 | 0.661 | 0.7 |
NCP + NBP | 0.662 | 0.326 | 0.636 | 0.688 | 0.703 |
Kmer + NCP + NBP | 0.677 | 0.36 | 0.623 | 0.731 | 0.73 |
Kmer + ENAC | 0.662 | 0.326 | 0.657 | 0.667 | 0.704 |
Kmer + EIIP | 0.636 | 0.274 | 0.697 | 0.574 | 0.667 |
Kmer + PC-PseDNC-Generel + ANF + EIIP + ENAC + NCP + NBP | 0.728 | 0.462 | 0.795 | 0.661 | 0.775 |
Feature | The original dimension | The dimension after feature selection |
NCP | 63 | 14 |
EIIP | 21 | 21 |
NBP | 84 | 15 |
ENAC | 105 | 41 |
ANF | 21 | 8 |
Kmer | 64 | 1 |
PC-PseDNC-general | 22 | 0 |
Feature | The Original Dimension | The dimension after feature selection |
NCP | 93 | 22 |
EIIP | 31 | 31 |
NBP | 124 | 25 |
ENAC | 155 | 53 |
ANF | 31 | 19 |
Kmer | 256 | 0 |
PC-PseDNC-general | 18 | 0 |
Feature | The original dimension | The dimension after feature selection |
NCP | 63 | 11 |
EIIP | 21 | 21 |
NBP | 84 | 17 |
ENAC | 105 | 43 |
ANF | 21 | 7 |
Kmer | 64 | 1 |
PC-PseDNC-general | 22 | 0 |
Species | Classifier | Cross-validation | ||||
ACC | MCC | SN | SP | AUC | ||
H.sapiens | iRNA-PseU | 0.604 | 0.21 | 0.61 | 0.598 | 0.64 |
PseUI | 0.642 | 0.28 | 0.649 | 0.636 | 0.68 | |
iRNA-CNN | 0.667 | 0.34 | 0.65 | 0.688 | / | |
XG-PseU | 0.661 | 0.32 | 0.635 | 0.687 | 0.7 | |
RF-PseU | 0.643 | 0.29 | 0.661 | 0.626 | 0.7 | |
iPseU-TWSVM | 0.65 | 0.301 | 0.697 | 0.602 | 0.682 | |
S.cerevisiae | iRNA-PseU | 0.645 | 0.29 | 0.647 | 0.643 | 0.81 |
PseUI | 0.641 | 0.3 | 0.647 | 0.675 | 0.69 | |
iRNA-CNN | 0.682 | 0.37 | 0.664 | 0.705 | / | |
XG-PseU | 0.682 | 0.37 | 0.668 | 0.695 | 0.77 | |
RF-PseU | 0.748 | 0.49 | 0.772 | 0.724 | 0.81 | |
iPseU-TWSVM | 0.722 | 0.45 | 0.656 | 0.786 | 0.758 | |
M.musculus | iRNA-PseU | 0.691 | 0.38 | 0.733 | 0.648 | 0.75 |
PseUI | 0.704 | 0.41 | 0.799 | 0.703 | 0.71 | |
iRNA-CNN | 0.718 | 0.44 | 0.748 | 0.691 | / | |
XG-PseU | 0.72 | 0.45 | 0.765 | 0.676 | 0.74 | |
RF-PseU | 0.748 | 0.5 | 0.731 | 0.765 | 0.796 | |
iPseU-TWSVM | 0.728 | 0.462 | 0.795 | 0.661 | 0.775 |
Species | Classifier | Independent testing | ||||
ACC | MCC | SN | SP | AUC | ||
H.sapiens | iRNA-PseU | 0.65 | 0.3 | 0.6 | 0.7 | / |
PseUI | 0.655 | 0.31 | 0.63 | 0.7 | / | |
iRNA-CNN | 0.69 | 0.4 | 0.777 | 0.68 | / | |
XG-PseU | 0.675 | / | / | 0.608 | / | |
RF-PseU | 0.75 | 0.5 | 0.78 | 0.72 | 0.8 | |
iPseU-TWSVM | 0.763 | 0.529 | 0.825 | 0.7 | 0.786 | |
S.cerevisiae | iRNA-PseU | 0.6 | 0.2 | 0.63 | 0.57 | / |
PseUI | 0.685 | 0.37 | 0.65 | 0.72 | / | |
iRNA-CNN | 0.735 | 0.47 | 0.688 | 0.778 | / | |
XG-PseU | 0.71 | / | / | / | / | |
RF-PseU | 0.77 | 0.54 | 0.75 | 0.79 | 0.838 | |
iPseU-TWSVM | 0.825 | 0.65 | 0.85 | 0.8 | 0.905 |
Scores type | iPseU-TWSVM | RF-PseU | XG-PseU | iRNA-CNN | PseUI | iRNA-PseU |
Cross-validation | 0.7 | 0.713 | 0.687 | 0.689 | 0.662 | 0.647 |
Independent testing | 0.794 | 0.76 | 0.693 | 0.713 | 0.7 | 0.625 |
Feature Subset | TWSVM | ||||
ACC | MCC | SN | SP | AUC | |
Kmer | 0.59 | 0.181 | 0.533 | 0.646 | 0.618 |
PC-PseDNC-General | 0.534 | 0.07 | 0.434 | 0.635 | 0.543 |
ANF | 0.526 | 0.053 | 0.568 | 0.485 | 0.518 |
EIIP | 0.572 | 0.144 | 0.525 | 0.618 | 0.6 |
ENAC | 0.587 | 0.178 | 0.472 | 0.7 | 0.59 |
NCP + NBP | 0.584 | 0.172 | 0.53 | 0.639 | 0.582 |
Kmer + NCP + NBP | 0.59 | 0.182 | 0.532 | 0.648 | 0.587 |
Kmer + ENAC | 0.603 | 0.208 | 0.582 | 0.624 | 0.62 |
Kmer + EIIP | 0.606 | 0.212 | 0.585 | 0.626 | 0.625 |
Kmer + PC-PseDNC-Generel + ANF + EIIP + ENAC + NCP + NBP |
0.65 | 0.301 | 0.697 | 0.602 | 0.682 |
Feature Subset | TWSVM | ||||
ACC | MCC | SN | SP | AUC | |
Kmer | 0.627 | 0.259 | 0.625 | 0.63 | 0.67 |
PC-PseDNC-General | 0.574 | 0.153 | 0.666 | 0.483 | 0.589 |
ANF | 0.584 | 0.175 | 0.716 | 0.452 | 0.584 |
EIIP | 0.614 | 0.238 | 0.473 | 0.755 | 0.632 |
ENAC | 0.634 | 0.326 | 0.435 | 0.836 | 0.693 |
NCP + NBP | 0.669 | 0.34 | 0.659 | 0.678 | 0.711 |
Kmer + NCP + NBP | 0.664 | 0.33 | 0.653 | 0.675 | 0.683 |
Kmer + ENAC | 0.653 | 0.308 | 0.631 | 0.675 | 0.683 |
Kmer + EIIP | 0.631 | 0.263 | 0.628 | 0.634 | 0.662 |
Kmer + PC-PseDNC-Generel+ANF + EIIP + ENAC + NCP + NBP | 0.722 | 0.45 | 0.656 | 0.786 | 0.758 |
Feature Subset | TWSVM | ||||
ACC | MCC | SN | SP | AUC | |
Kmer | 0.584 | 0.17 | 0.676 | 0.49 | 0.614 |
PC-PseDNC-General | 0.541 | 0.084 | 0.517 | 0.566 | 0.539 |
ANF | 0.553 | 0.113 | 0.71 | 0.396 | 0.527 |
EIIP | 0.625 | 0.276 | 0.826 | 0.424 | 0.632 |
ENAC | 0.664 | 0.329 | 0.667 | 0.661 | 0.7 |
NCP + NBP | 0.662 | 0.326 | 0.636 | 0.688 | 0.703 |
Kmer + NCP + NBP | 0.677 | 0.36 | 0.623 | 0.731 | 0.73 |
Kmer + ENAC | 0.662 | 0.326 | 0.657 | 0.667 | 0.704 |
Kmer + EIIP | 0.636 | 0.274 | 0.697 | 0.574 | 0.667 |
Kmer + PC-PseDNC-Generel + ANF + EIIP + ENAC + NCP + NBP | 0.728 | 0.462 | 0.795 | 0.661 | 0.775 |
Feature | The original dimension | The dimension after feature selection |
NCP | 63 | 14 |
EIIP | 21 | 21 |
NBP | 84 | 15 |
ENAC | 105 | 41 |
ANF | 21 | 8 |
Kmer | 64 | 1 |
PC-PseDNC-general | 22 | 0 |
Feature | The Original Dimension | The dimension after feature selection |
NCP | 93 | 22 |
EIIP | 31 | 31 |
NBP | 124 | 25 |
ENAC | 155 | 53 |
ANF | 31 | 19 |
Kmer | 256 | 0 |
PC-PseDNC-general | 18 | 0 |
Feature | The original dimension | The dimension after feature selection |
NCP | 63 | 11 |
EIIP | 21 | 21 |
NBP | 84 | 17 |
ENAC | 105 | 43 |
ANF | 21 | 7 |
Kmer | 64 | 1 |
PC-PseDNC-general | 22 | 0 |
Species | Classifier | Cross-validation | ||||
ACC | MCC | SN | SP | AUC | ||
H.sapiens | iRNA-PseU | 0.604 | 0.21 | 0.61 | 0.598 | 0.64 |
PseUI | 0.642 | 0.28 | 0.649 | 0.636 | 0.68 | |
iRNA-CNN | 0.667 | 0.34 | 0.65 | 0.688 | / | |
XG-PseU | 0.661 | 0.32 | 0.635 | 0.687 | 0.7 | |
RF-PseU | 0.643 | 0.29 | 0.661 | 0.626 | 0.7 | |
iPseU-TWSVM | 0.65 | 0.301 | 0.697 | 0.602 | 0.682 | |
S.cerevisiae | iRNA-PseU | 0.645 | 0.29 | 0.647 | 0.643 | 0.81 |
PseUI | 0.641 | 0.3 | 0.647 | 0.675 | 0.69 | |
iRNA-CNN | 0.682 | 0.37 | 0.664 | 0.705 | / | |
XG-PseU | 0.682 | 0.37 | 0.668 | 0.695 | 0.77 | |
RF-PseU | 0.748 | 0.49 | 0.772 | 0.724 | 0.81 | |
iPseU-TWSVM | 0.722 | 0.45 | 0.656 | 0.786 | 0.758 | |
M.musculus | iRNA-PseU | 0.691 | 0.38 | 0.733 | 0.648 | 0.75 |
PseUI | 0.704 | 0.41 | 0.799 | 0.703 | 0.71 | |
iRNA-CNN | 0.718 | 0.44 | 0.748 | 0.691 | / | |
XG-PseU | 0.72 | 0.45 | 0.765 | 0.676 | 0.74 | |
RF-PseU | 0.748 | 0.5 | 0.731 | 0.765 | 0.796 | |
iPseU-TWSVM | 0.728 | 0.462 | 0.795 | 0.661 | 0.775 |
Species | Classifier | Independent testing | ||||
ACC | MCC | SN | SP | AUC | ||
H.sapiens | iRNA-PseU | 0.65 | 0.3 | 0.6 | 0.7 | / |
PseUI | 0.655 | 0.31 | 0.63 | 0.7 | / | |
iRNA-CNN | 0.69 | 0.4 | 0.777 | 0.68 | / | |
XG-PseU | 0.675 | / | / | 0.608 | / | |
RF-PseU | 0.75 | 0.5 | 0.78 | 0.72 | 0.8 | |
iPseU-TWSVM | 0.763 | 0.529 | 0.825 | 0.7 | 0.786 | |
S.cerevisiae | iRNA-PseU | 0.6 | 0.2 | 0.63 | 0.57 | / |
PseUI | 0.685 | 0.37 | 0.65 | 0.72 | / | |
iRNA-CNN | 0.735 | 0.47 | 0.688 | 0.778 | / | |
XG-PseU | 0.71 | / | / | / | / | |
RF-PseU | 0.77 | 0.54 | 0.75 | 0.79 | 0.838 | |
iPseU-TWSVM | 0.825 | 0.65 | 0.85 | 0.8 | 0.905 |
Scores type | iPseU-TWSVM | RF-PseU | XG-PseU | iRNA-CNN | PseUI | iRNA-PseU |
Cross-validation | 0.7 | 0.713 | 0.687 | 0.689 | 0.662 | 0.647 |
Independent testing | 0.794 | 0.76 | 0.693 | 0.713 | 0.7 | 0.625 |