Characteristics and subtypes of depressive symptoms in Chinese female breast cancer patients of different ages: a cross-sectional study

Yanyan Li; Hong Liu; Yaoyao Sun; Jie Li; Yanhong Chen; Xuan Zhang; Juan Wang; Liuliu Wu; Di Shao; Fenglin Cao; Yanyan Li; Hong Liu; Yaoyao Sun; Jie Li; Yanhong Chen; Xuan Zhang; Juan Wang; Liuliu Wu; Di Shao; Fenglin Cao

doi:10.3934/publichealth.2021055

AIMS Public Health

2021, Volume 8, Issue 4: 691-703. doi: 10.3934/publichealth.2021055

Previous Article Next Article

Research article Topical Sections

Characteristics and subtypes of depressive symptoms in Chinese female breast cancer patients of different ages: a cross-sectional study

1.
Department of Nursing Psychology, School of Nursing and Rehabilitation, Cheeloo College of Medicine, Shandong University, Shandong Province, China
2.
Center for Health Management and Policy Research, Shandong University, Shandong Province, China
3.
Department of Gastroenterology, Shandong Cancer Hospital and Institute, Shandong Province, China

Received: 11 July 2021 Accepted: 13 October 2021 Published: 20 October 2021

Purpose

To identify the characteristics and subtypes of depressive symptoms and explore the relationship between depressive subtypes and age among Chinese female breast cancer patients.

Method

In this cross-sectional study, 566 breast cancer patients were recruited from three tertiary comprehensive hospital in Shandong Province, China through convenient sampling from April 2013 to June 2019. Depressive symptoms were measured using the Patient Health Questionnaire-9 (PHQ-9). Data analyses included descriptive analyses, latent class analysis.

Results

There were significant differences in specific depressive symptoms by age group, but no significant difference in total scores on PHQ-9. The depressive subtypes were severe (Class 4), relatively severe (Class 3; with lower psychomotor agitation/retardation and suicidal ideation), moderate (Class 2; with higher psychomotor agitation/retardation and suicidal ideation), and mild depressive symptoms (Class 1). The distribution of depression subtypes is different in various age groups. In the 45–59 age groups, severe symptoms subtype showed the highest ratios (i.e. 50.3%).

Conclusion

This is the first study that analyses depressive symptom characteristics and identifies depressive subtypes in Chinese women with breast cancer across ages to explore symptom heterogeneity. Our findings can contribute to identifying the mechanisms behind these relationships and developing targeted interventions for patients with specific depressive subtypes.

Keywords:

Citation: Yanyan Li, Hong Liu, Yaoyao Sun, Jie Li, Yanhong Chen, Xuan Zhang, Juan Wang, Liuliu Wu, Di Shao, Fenglin Cao. Characteristics and subtypes of depressive symptoms in Chinese female breast cancer patients of different ages: a cross-sectional study[J]. AIMS Public Health, 2021, 8(4): 691-703. doi: 10.3934/publichealth.2021055

Related Papers:

[1]	Hong Yuan, Jing Huang, Jin Li . Protein-ligand binding affinity prediction model based on graph attention network. Mathematical Biosciences and Engineering, 2021, 18(6): 9148-9162. doi: 10.3934/mbe.2021451
[2]	Jun Yan, Tengsheng Jiang, Junkai Liu, Yaoyao Lu, Shixuan Guan, Haiou Li, Hongjie Wu, Yijie Ding . DNA-binding protein prediction based on deep transfer learning. Mathematical Biosciences and Engineering, 2022, 19(8): 7719-7736. doi: 10.3934/mbe.2022362
[3]	Yuqing Qian, Tingting Shang, Fei Guo, Chunliang Wang, Zhiming Cui, Yijie Ding, Hongjie Wu . Identification of DNA-binding protein based multiple kernel model. Mathematical Biosciences and Engineering, 2023, 20(7): 13149-13170. doi: 10.3934/mbe.2023586
[4]	Mingshuai Chen, Xin Zhang, Ying Ju, Qing Liu, Yijie Ding . iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. Mathematical Biosciences and Engineering, 2022, 19(12): 13829-13850. doi: 10.3934/mbe.2022644
[5]	Jian Zhang, Xingchen Liang, Feng Zhou, Bo Li, Yanling Li . TYLER, a fast method that accurately predicts cyclin-dependent proteins by using computation-based motifs and sequence-derived features. Mathematical Biosciences and Engineering, 2021, 18(5): 6410-6429. doi: 10.3934/mbe.2021318
[6]	Xiao Chen, Zhaoyou Zeng . Bird sound recognition based on adaptive frequency cepstral coefficient and improved support vector machine using a hunter-prey optimizer. Mathematical Biosciences and Engineering, 2023, 20(11): 19438-19453. doi: 10.3934/mbe.2023860
[7]	Haipeng Zhao, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui, Hongjie Wu . Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory. Mathematical Biosciences and Engineering, 2024, 21(1): 170-185. doi: 10.3934/mbe.2024008
[8]	Ying Sun, Wei Du, Lili Yang, Min Dai, Ziying Dou, Yuxiang Wang, Jining Liu, Gang Zheng . Computational methods for recognition of cancer protein markers in saliva. Mathematical Biosciences and Engineering, 2020, 17(3): 2453-2469. doi: 10.3934/mbe.2020134
[9]	Xiao Wang, Jianbiao Zhang, Ai Zhang, Jinchang Ren . TKRD: Trusted kernel rootkit detection for cybersecurity of VMs based on machine learning and memory forensic analysis. Mathematical Biosciences and Engineering, 2019, 16(4): 2650-2667. doi: 10.3934/mbe.2019132
[10]	Linlu Song, Shangbo Ning, Jinxuan Hou, Yunjie Zhao . Performance of protein-ligand docking with CDK4/6 inhibitors: a case study. Mathematical Biosciences and Engineering, 2021, 18(1): 456-470. doi: 10.3934/mbe.2021025

Abstract

Purpose

To identify the characteristics and subtypes of depressive symptoms and explore the relationship between depressive subtypes and age among Chinese female breast cancer patients.

Method

Results

Conclusion

1. Introduction

Hormone-binding protein (HPB) is a kind of protein that selectively and non-covalently binds to hormone. HPB is a soluble outer region of the growth hormone receptor (HR), and is an important component of the growth hormone (GH)-insulin-like growth factor axis ^[1]. The abnormal expression of HBP can cause a variety of diseases ^[2]. Due to the complex in vivo effects of HBP, its biological function is still not fully understood ^[1]. Therefore, accurate identification of HBP will be helpful to understand the molecular mechanisms and regulatory pathways of HBP.

Traditional methods to identify HBP were wet biochemical experiments, such as immunoprecipitation, chromatography, crosslinking assays, etc ^[3,4,5,6]. However, the disadvantages of these methods, such as time-consuming and expensive, make them are unable to keep up with the rapid growth of protein sequences in the post-genomic era. Therefore, it is necessary to develop automatics machine learning methods to identify HBP. As a pioneer work, Tang et al. developed a support vector machine-based method to identify HBP in which proteins were encoded using the optimal features obtained by adopting optimized dipeptide composition ^[7]. Subsequently, Basith et al. developed a computational predictor named iGHBP, in which an optimal feature set was obtained based on combining dipeptide composition and amino acid index value by adopting two-step feature selection protocol ^[8]. However, the overall accuracy was still far from satisfactory. In order to improve the performance for the identification of HBP, it is necessary to apply new feature extraction and selection methods to select optimal features to represent HBP.

In this paper, by examining 5 feature encoding methods and 2 feature selection methods, we investigated the advantages and disadvantages of various models for identifying HBP and then established a predictor called HBPred2.0 based on the optimal model. Finally, a user-friendly webserver was established for HBPred2.0. The paper is organized based on the following aspects (Figure 1): (1) The construction of benchmark dataset, (2) feature extraction and selection, (3) machine learning method, and (4) performance evaluation.

Figure 1. The framework of this work.

DownLoad: Full-Size Img PowerPoint

2. Materials and methods

2.1. Benchmark dataset and independent dataset

This paper adopted the benchmark dataset built by Tang et al. ^[7]. In the database, there are 123 hormone-binding proteins (HBPs) and 123 none hormone-binding proteins (non-HBPs). To verify the portability and validity of the model, we built a high quality independent dataset by obeying following rules. Firstly, we selected the 357 manually annotated and reviewed HBP proteins from Universal Protein Resource (UniProt) ^[9] using 'hormone-binding' as keywords in molecular function item of Gene Ontology. Subsequently, we excluded the proteins with sequence identity > 60% by using CD-HIT ^[10]. Thirdly, sequences that appear in the training dataset were excluded. As a result, 46 HBPs were obtained as independent positive samples. Negative samples were randomly selected from UniProt while using 'hormone' and 'DNA damage binding' as keywords in molecular function item of Gene Ontology, respectively. The sequence identities of negative samples are also ≤ 60%. Finally, 46 non-HBPs (37 hormone proteins and 9 DNA damage binding proteins) were randomly obtained. It should be noted that there is no similar sequences between the training and testing data. All data could be downloaded from http://lin-group.cn/server/HBPred2.0/download.html.

2.2. Feature extraction methods

2.2.1. Natural vector method (NV)

Suppose a sample protein P with L residues, it can be expressed as below.

${\bf{P}} = {R_1}{R_2} \ldots {R_i} \ldots {R_L}$

(1)

where R_i represents the i-th amino acid residue of the sample protein P; i = (1, 2, …L). The Natural Vector Method (NV) method is briefly described as follows ^[11]:

For each of the 20 amino acid k, define:

${w_k}\left( \cdot \right):\left( {A, C, D, E, \ldots , W, Y} \right) \to \left( {0, 1} \right)$

(2)

where w_k(R_i) = 1, if R_i = k. otherwise, w_k(R_i) = 0.

Let n_k be the number of amino acid k in the protein sequence P, which can be calculated as:

${n_k} = \sum\nolimits_{i = 1}^L {{w_k}\left( {{R_i}} \right)}$

(3)

Let s_(k)(i) be the distance from the first amino acid (regarded as origin) to the i-th amino acid k in the protein sequence. Let T_k be the total distance of each set of the 20 amino acids. Let μ_k be the mean position of the amino acid k. And they can be calculated as:

$\left\{ {\begin{array}{*{20}{c}} {{s_{\left( k \right)\left( i \right)}} = i \times {w_k}\left( {{R_i}} \right)}\\ {{T_k} = \sum\nolimits_{i = 1}^{{n_k}} {{s_{\left( k \right)\left( i \right)}}} }\\ {{\mu _k} = {T_k}/{n_k}} \end{array}} \right.$

(4)

Let D₂^k be the second-order normalized central moments, which can be calculated as:

$D_2^k = \sum\nolimits_{i = 1}^{{n_k}} {\frac{{{{\left( {{s_{\left( k \right)\left( i \right)}} - {\mu _k}} \right)}^2}}}{{{n_k} \times L}}}$

(5)

Thus, a sample protein P can be formulated as:

${\bf{P}} = {\left[ {{n_A}, {\mu _A}, D_2^A, \ldots , {n_R}, {\mu _R}, D_2^{{R_i}}, \ldots {n_Y}, {\mu _Y}, D_2^Y} \right]^T}$

(6)

where the symbol T is the transposition of the vector.

2.2.2. Composition transition distribution (CTD)

The CTD was first proposed for protein folding class prediction by Dubchak et al. in 1995 ^[12]. It's a global composition feature extraction method includes hydrophobicity, polarity, normalized van der Waals volume, polarizability, predicted secondary structure, solvent accessibility and so on. In this method, 20 amino acids were divided into 3 different groups: polar, neutral, and hydrophobic. For each of the amino acids attributes, three descriptors (C, T, D) were calculated. 'C' stands for 'Composition', which represents the composition percentage of each group in the peptide sequence, and thus can yield 3 features. 'T' stands for 'Transition', which represents the transition probability between two neighboring amino acids belonging to two different groups, and thus can yield 3 features. 'D' stands for 'Distribution', which represents the position (the first, 25%, 50%, 75%, or 100%) of amino acids in each group in the protein sequence, and thus can yield 5 features for each group (total 15 features).

In this paper, the sequence description of a sample protein P in term of hydrophobicity consists of 3 + 3 + 15 = 21 features.

2.2.3. G-gap dipeptide composition (g-gap)

Adjacent dipeptide composition can only express the correlation between two adjacent amino acid residues. In fact, the amino acids with g-gap residues may be adjacent in three-dimensional space ^[13]. To find important correlations in protein sequences, we used the g-gap dipeptide composition that extends from adjacent dipeptides. A protein P can be formulated as below by using this method.

${\bf{P}} = {\left[ {v_1^g, v_2^g, \ldots , v_i^g, \ldots v_{400}^g} \right]^T}$

(7)

where the symbol T is the transposition of the vector; the v_i^g is the frequency of the i-th (i = 1, 2, …, 400) g-gap dipeptide and can be formulated as:

$v_i^g = \frac{{n_i^g}}{{L - g - 1}}$

(8)

where n_i^g is the number of the i-th g-gap dipeptide; L is the length of the protein P; g is the number of amino acid residues separated by two amino acid residues.

In this paper, we studied the cases of g ranging from 1 to 9 because the case of g = 0 has been studied in reference ^[7].

2.2.4. Pseudo amino acid composition (PseAAC)

The PseAAC method can not only include amino acid composition, but also the correlation of physicochemical properties between two residues ^[14,15]. In this paper, we adopted the type Ⅱ PseAAC, in which a sample protein P can be formulated as below.

${\bf{P}} = {\left[ {{x_1}, {x_2} \ldots , {x_{400}}, {x_{401}}, \ldots {x_{400 + 9\lambda }}} \right]^T}$

(9)

where '9' is the number of amino acid physicochemical properties considered, namely, hydrophobicity, hydrophilicity, mass, pK1, pK2, pI, rigidity, flexibility and irreplaceability; 'λ' is the rank of correlation; 'x' is the frequencies for each element and is formulated as:

${x_u} = \left\{ {\begin{array}{*{20}{c}} {\frac{{{f_u}}}{{\mathop \sum \nolimits_{i = 1}^{400} {f_u} + \omega \mathop \sum \nolimits_{j = 1}^{9\lambda } {\tau _j}}}, \left( {1 \le u \le 400} \right)}\\ {\frac{{\omega {\tau _j}}}{{\mathop \sum \nolimits_{i = 1}^{400} {f_u} + \omega \mathop \sum \nolimits_{j = 1}^{9\lambda } {\tau _j}}}, \left( {401 \le u \le 400 + 9\lambda } \right)} \end{array}} \right.$

(10)

where ω is the weight factor for the sequence order effect; f_u is the frequency of the 400 dipeptides; τ_j is the correlation factor of the physicochemical properties between residues. More detailed information about the formula derivation process can be found in the reference ^[16].

In this paper, the parameter λ is from 1 to 95 with the step of 1, the parameter ω is from 0.1 to 1 with the step of 0.1. Therefore, 95×10 = 950 feature subsets based on PseAAC will be obtained.

2.2.5. Tripeptide composition (TPC)

Tripeptide is composed of three adjacent amino acids in a protein sequence, which is a biosignaling with minimal functionality. By adopting TPC, a sample protein P can be formulated by:

${\bf{P}} = {\rm{}}{\left[ {{t_1}, {t_2}, \ldots {t_i}, \ldots , {t_{8000}}} \right]^T}$

(11)

where the symbol T is the transposition of the vector; the t_i is the frequency of the i-th (i = 1, 2, …, 8000) tripeptide and can be formulated as:

${t_i} = \frac{{{n_i}}}{{L - 2}}$

(12)

where n_i is the number of the i-th tripeptide; L is the length of the protein P.

2.3. Feature selection methods

2.3.1. Analysis of variance (ANOVA)

Feature selection is important to improve the classification performance. It can filter the noisy features ^{[17,18,19,20]}. We adopted the ANOVA method to select optimal features from g-gap dipeptide compositions and PseAAC. The ANOVA method calculated the ratio of the variance among groups and the variance within groups for each attribute ^[21,22]. The formula expressions can be described as follows:

$F\left( i \right) = \frac{{S_b^2\left( i \right)}}{{S_w^2\left( i \right)}}$

(13)

where F(i) is the score of the i-th feature, a high F(i)-value means a high ability to identify the sample; S_w²(i) is the variance within groups; S_b²(i) is the variance among groups; and they can be calculated as follows:

$\left\{ {\begin{array}{*{20}{c}} {S_b^2\left( i \right) = \frac{{S{S_b}\left( i \right)}}{{K - 1}}}\\ {S_w^2\left( i \right) = \frac{{S{S_w}\left( i \right)}}{{N - K}}} \end{array}} \right.$

(14)

where SS_b(i) is the sum of the squares between the groups; SS_w(i) is the sum of squares within the groups; K is the total number of classes; N is the total number of samples.

2.3.2. Binomial distribution (BD)

We adopted the BD method to select optimal features from tripeptide composition ^[21]. In this algorithm, the confidence level (CL) of each feature can be calculated by:

$C{L_{ij}} = 1 - \sum\nolimits_{k = {n_{ij}}}^{{N_i}} {\frac{{{N_i}!}}{{k!\left( {{N_i} - k} \right)!}}} q_j^k{\left( {1 - {q_j}} \right)^{{N_i} - k}}$

(15)

where CL_ij is the confidence level for the i-th tripeptide in the j-th type; j denotes the type of samples (positive sample or negative sample); N_i is the total number of the i-th tripeptide in the dataset; the probability q_j is the relative frequency of type j in the dataset;

According to the formula as defined in Eq. (15), a high CL-value means a high ability to identify the sample. The BD method can extract the over-represented motifs, which is an excellent statistical method widely used in bioinformatics ^[23,24].

2.3.3. Incremental feature selection (IFS) process

In general, if a model was built on a low-dimensional feature subset, it will not provide enough information. On the contrary, if a model was built on a high-dimensional feature subset, it can lead to information redundancy and overfitting problems. Therefore, the ANOVA and BD method with the IFS process and 5-fold cross-validation was applied to investigate the optimal feature set with the maximum accuracy ^[7,25,26,27] (Figure 2). We ranked all features according to the F(i)-values or CL-values and obtained new feature vectors, which are shown below.

${\bf{P'}} = {\left[ {{{g'}_1},{{g'}_2}, \ldots {{g'}_n}} \right]^T}$

(16)

Figure 2. The framework of the IFS process.

DownLoad: Full-Size Img PowerPoint

The first feature subset contains the feature with the highest F(i)-value or CL-value, P^' = [g₁^']^T; By adding the second highest F(i)-value or CL-value to the first subset, the second feature subset P^' = [g₁^', g₂^']^T is formed. The procedure was repeated until all features were considered.

2.4. Support vector machine (SVM)

The support vector machine (SVM) is a supervised machine learning method and has been widely used in bioinformatics ^{[28,29,30,31,32,33]}. Its main idea is to map the input features from low-dimensional space to a high-dimensional space through nonlinear transformation and find the optimal linear classification surface. For convenience, SVM software packages LibSVM can be download from https://www.csie.ntu.edu.tw/~cjlin/libsvm/. In the current study, the LibSVM-3.22 package was adopted to investigate the performance for identifying HBP. Besides, the radical basis function kernel was selected to perform predictions. The grid search spaces are [2^-5, 2¹⁵] with step of 2 for penalty parameter C and [2³, 2^-15] with step of 2^-1 for kernel parameter g.

2.5. Performance evaluation

Three cross-validation methods, namely, the independent dataset test, the sub-sampling test, and the jackknife test, are widely used to investigate the performance of a predictor in practical application ^{[30,34,35,36,37,38,39,40,41]}. In order to save computing time, the 5-fold cross-validation test was adopted to calculate the optimal parameter C and g of SVM in this paper.

Five evaluation indexes were adopted to evaluate the models ^{[42,43,44,45,46,47,48,49]}. Sensitivity (S_n) is used to evaluate the model's ability to correctly predict positive samples. Specificity (S_p) is used to evaluate the model's ability to correctly predict negative samples. Overall Accuracy (Acc) reflects the proportion of the entire benchmark dataset that can be correctly predicted. The Matthew correlation coefficient (Mcc) is used to evaluate the reliability of the algorithm. Area under the ROC curve (AUC) reflects model's classification ability across decision values. They can be calculated as follows:

$\left\{ {\begin{array}{*{20}{c}} {{S_n} = \frac{{TP}}{{TP + FN}}}\\ {{S_p} = \frac{{TN}}{{TN + FP}}}\\ {Acc = \frac{{TP + TN}}{{TP + TN + FN + FP}}}\\ {Mcc = \frac{{TP \times TN - FP \times FN}}{{\sqrt {\left( {TP + FP} \right)\left( {TP + FN} \right)\left( {TN + FP} \right)\left( {TN + FN} \right)} }}} \end{array}} \right.$

(16)

where TP, TN, FP, and FN represent the number of the correctly recognized positive samples, the number of the correctly recognized negative samples, the number of negative samples recognized as positive samples, and the number of positive samples recognized as negative samples, respectively.

3. Results and discussion

3.1. Performances of different features

In this study, we examined the performance of 5 feature extraction methods and their combinations. Based on CTD, NV, CTD+NV methods, protein samples can be expressed as 21-D (dimensional), 60-D and 81-D vector, respectively. The Accs of 60.16%, 70.33% and 67.07% were obtained by using SVM in the 5-fold cross-validation, respectively (as shown in Table 1). It was found that the prediction performances were far from satisfactory.

Table 1. The results and the corresponding number of features based on different methods.

Feature extraction	C	g	S_n(%)	S_p(%)	Acc(%)	Mcc	AUC
CTD (21-D)	2	2³	36.59	83.74	60.16	0.230	0.654
NV (60-D)	2^-5	2^-13	70.73	69.92	70.33	0.407	0.762
CTD+NV (81-D)	2⁹	2^-7	70.73	63.41	67.07	0.342	0.709

| Show Table

DownLoad: CSV

Based on the g-gap method, a protein sample can be expressed as a 400-D vector. By changing the value of g from 1 to 9, we obtained 9 feature subsets. Firstly, we investigated the performances of these 400-D features subsets based on SVM. The results were reported in Figure 3A. Subsequently, the ANOVA method with the IFS process was applied to investigate the optimal feature set, and the results were recorded in Figure 3B. One may notice that while g = 1, a maximum Acc of 80.89% was obtained when the top 144 features were used. Obviously, Accs were significantly increased by adopting ANOVA method. However, prediction performances still needed to improve.

Figure 3. Accs for g-gap dipeptide composition. (A) Different g values corresponding to different Accs; (B) A plot showing the IFS curves based on g-gap methods.

DownLoad: Full-Size Img PowerPoint

Based on the PseAAC method, we obtained 95×10 = 950 (95 kinds of λ and 10 kinds of ω) feature subsets. Firstly, we investigated the performances of these 950 models by using SVM in the 5-fold cross-validation test and reported the results in Figure 4A. It was found that the maximum Acc of 76.83% was achieved when λ = 18 and ω = 0.1. In order to improve Acc, the ANOVA method was adopted to rank the 400 + 18 × 9 = 572 features. By adopting SVM with IFS, a maximum Acc of 84.15% was obtained when the top 194 features were used (Figure 4B). Although the result was encouraging, the Acc still has room to rise.

Figure 4. Accs for g-gap dipeptide composition. (A) A heat map for the Accs of 950 PseAAC models; (B) A plot showing the IFS curve based on PseAAC method.

DownLoad: Full-Size Img PowerPoint

Based on the TPC method, 8000 features were extracted for each protein sequence. Considering that it would lead to overfitting problem, the BD method was adopted as the feature selection method. By adopting SVM with IFS process in the 5-fold cross-validation test, a maximum Acc of 97.15% was obtained when the top 1169 features were used (Figure 5). In this case, the S_n, S_p and Mcc are 96.75%, 97.56%, and 0.943, respectively The AUC reached 0.994, this result indicates that the performance of the model based on the optimal TPC is smart and reliable for identifying HBP.

Figure 5. A plot showing the IFS curve based on TPC method.

DownLoad: Full-Size Img PowerPoint

3.2. Comparison with other methods

In order to show the superiority of SVM to identify HBP, we compared its performance with those of other machine learning algorithms based on the same feature subset (i.e. 1169 optimal features). From Table 2, we can find that the SVM classifier could produce the best performance among these algorithms. Thus, the final model was constructed based on SVM.

Table 2. Comparing SVM with other classifiers.

Classifier	S_n (%)	S_p (%)	Acc (%)	Mcc	AUC
J48	63.41	56.91	60.16	0.204	0.601
Bagging	80.49	57.72	69.11	0.392	0.770
Random Forest	88.62	84.55	86.59	0.732	0.945
Naive Bayes	95.93	92.68	94.31	0.887	0.965
SVM	96.75	97.56	97.15	0.943	0.994

| Show Table

DownLoad: CSV

It is also necessary to compare the methods proposed in this paper with existing methods. Table 3 shows the detailed results of different methods for identifying HBP. Based on the same benchmark dataset, Tang et al. achieved an Acc of 84.9% by using a SVM-based method, in which proteins sequences were encoded using the optimal 0-gap dipeptide composition features obtained by the ANOVA feature selection technique ^[7]. Basith et al. obtained an Acc of 84.96% in cross-validation test by training an extremely randomized tree with optimal features obtained from dipeptide composition and amino acid index values based on two-step feature selection ^[8]. Our proposed method could produce an Acc of 97.15% which is superior to the two published results, demonstrating that our method is more powerful for identifying HBP.

Table 3. Comparing our method with other published methods.

Reference	Methods	S_n (%)	S_p (%)	Acc (%)	Mcc	AUC
^[7]	HBPred	88.6	81.3	84.9	-	-
^[8]	iGHBP	88.62	81.30	84.96	-	0.701
This work	HBPred2.0	96.75	97.56	97.15	0.943	0.994

| Show Table

DownLoad: CSV

3.3. Performance evaluation based on the independent dataset

For further comparing the performance of these methods, an independent dataset was used. The results were recorded in Table 4. One may observe that the HBPred2.0 predictor achieved the best performance among the three predictors, suggesting that HBPre2.0 has better generalization ability.

Table 4. Performance evaluation based on the independent dataset.

Reference	Methods	S_n (%)	S_p (%)	Acc (%)	Mcc	AUC
^[7]	HBPred	80.43	56.52	68.48	0.381	0.714
^[8]	iGHBP	86.96	47.83	67.39	0.380	-
This work	HBPred2.0	89.13	80.43	84.78	0.698	0.814

| Show Table

DownLoad: CSV

Specificity could reflect the discriminated capability of model on negative samples. From the Table 4, a higher specificity of the HBPred2.0 indicates that the model could produce less false positives.

4. Conclusion

In this paper, we systematically investigated the performances of various features and classifiers on HBP prediction. By a great number of experiments, we obtained the best model by combining SVM with optimal tripeptide composition. This model could produce the overall accuracy of 84.78% on the independent data. Finally, Due to published database ^{[50,51,52,53]} and webserver ^{[54,55,56,57,58,59,60,61,62,63]} could provide more convenience for scientific community, we established a free webserver for the proposed method, called HBPred2.0, which can be free accessed form http://lin-group.cn/server/HBPred2.0/. We expect that the tool will help scholars to study the mechanism of HBP's function, and promote the development of related drug research.

Acknowledgments

This work was supported by the National Nature Scientific Foundation of China (61772119, 31771471, 61702430), Natural Science Foundation for Distinguished Young Scholar of Hebei Province (No. C2017209244), the Central Public Interest Scientific Institution Basal Research Fund (No. 2018GJM06).

Conflicts of interest

The authors declare that there is no conflict of interest.

Acknowledgments

This work was supported by the Natural Science Foundation of Shandong Province [Grant Number ZR2017MC070]. We are thankful for the generous contributions of the research participants and the staff who assisted with data collection during the study.

Conflict of interest

All authors declare no conflicts of interest in this paper.

References

[1]	Yap YS, Lu YS, Tamura K, et al. (2019) Insights Into Breast Cancer in the East vs the West: A Review. JAMA Oncol 5: 1489-1496. doi: 10.1001/jamaoncol.2019.0620
[2]	Chen W, Zheng R, Baade PD, et al. (2016) Cancer statistics in China, 2015. CA Cancer J Clin 66: 115-132. doi: 10.3322/caac.21338
[3]	Zeng H, Chen W, Zheng R, et al. (2018) Changing cancer survival in China during 2003–15: a pooled analysis of 17 population-based cancer registries. Lancet Glob Health 6: e555-e567. doi: 10.1016/S2214-109X(18)30127-X
[4]	Gold SM, Kohler-Forsberg O, Moss-Morris R, et al. (2020) Comorbid depression in medical diseases. Nat Rev Dis Primers 6: 69. doi: 10.1038/s41572-020-0200-2
[5]	Krebber AM, Buffart LM, Kleijn G, et al. (2014) Prevalence of depression in cancer patients: a meta-analysis of diagnostic interviews and self-report instruments. Psychooncology 23: 121-130. doi: 10.1002/pon.3409
[6]	Pratt LA, Brody DJ (2014) Depression in the U.S. household population, 2009–2012. NCHS Data Brief 1-8.
[7]	Pilevarzadeh M, Amirshahi M, Afsargharehbagh R, et al. (2019) Global prevalence of depression among breast cancer patients: a systematic review and meta-analysis. Breast Cancer Res Treat 176: 519-533. doi: 10.1007/s10549-019-05271-3
[8]	Walker J, Hansen CH, Martin P, et al. (2014) Prevalence, associations, and adequacy of treatment of major depression in patients with cancer: a cross-sectional analysis of routinely collected clinical data. Lancet Psychiatry 1: 343-350. doi: 10.1016/S2215-0366(14)70313-X
[9]	Mausbach BT, Schwab RB, Irwin SA (2015) Depression as a predictor of adherence to adjuvant endocrine therapy (AET) in women with breast cancer: a systematic review and meta-analysis. Breast Cancer Res Treat 152: 239-246. doi: 10.1007/s10549-015-3471-7
[10]	Markovitz LC, Drysdale NJ, Bettencourt BA (2017) The relationship between risk factors and medication adherence among breast cancer survivors: What explanatory role might depression play? Psychooncology 26: 2294-2299. doi: 10.1002/pon.4362
[11]	Chan CM, Wan Ahmad WA, Yusof MM, et al. (2015) Effects of depression and anxiety on mortality in a mixed cancer group: a longitudinal approach using standardised diagnostic interviews. Psychooncology 24: 718-725. doi: 10.1002/pon.3714
[12]	Grotmol KS, Lie HC, Hjermstad MJ, et al. (2017) Depression-A Major Contributor to Poor Quality of Life in Patients With Advanced Cancer. J Pain Symptom Manage 54: 889-897. doi: 10.1016/j.jpainsymman.2017.04.010
[13]	Hasson-Ohayon I, Goldzweig G, Dorfman C, et al. (2014) Hope and social support utilisation among different age groups of women with breast cancer and their spouses. Psychol Health 29: 1303-1319. doi: 10.1080/08870446.2014.929686
[14]	Naik H, Leung B, Laskin J, et al. (2020) Emotional distress and psychosocial needs in patients with breast cancer in British Columbia: younger versus older adults. Breast Cancer Res Treat 179: 471-477. doi: 10.1007/s10549-019-05468-6
[15]	Avis NE, Levine B, Naughton MJ, et al. (2012) Explaining age-related differences in depression following breast cancer diagnosis and treatment. Breast Cancer Res Treat 136: 581-591. doi: 10.1007/s10549-012-2277-0
[16]	Vazquez D, Rosenberg S, Gelber S, et al. (2020) Posttraumatic stress in breast cancer survivors diagnosed at a young age. Psychooncology 29: 1312-1320. doi: 10.1002/pon.5438
[17]	Ruddy KJ, Partridge AH (2012) The unique reproductive concerns of young women with breast cancer. Adv Exp Med Biol 732: 77-87. doi: 10.1007/978-94-007-2492-1_6
[18]	Gomez-Campelo P, Bragado-Alvarez C, Hernandez-Lloreda MJ (2014) Psychological distress in women with breast and gynecological cancer treated with radical surgery. Psychooncology 23: 459-466. doi: 10.1002/pon.3439
[19]	Rosenberg SM, Tamimi RM, Gelber S, et al. (2014) Treatment-related amenorrhea and sexual functioning in young breast cancer survivors. Cancer 120: 2264-2271. doi: 10.1002/cncr.28738
[20]	Fontein DB, de Glas NA, Duijm M, et al. (2013) Age and the effect of physical activity on breast cancer survival: A systematic review. Cancer Treat Rev 39: 958-965. doi: 10.1016/j.ctrv.2013.03.008
[21]	Lan B, Jiang S, Li T, et al. (2020) Depression, anxiety, and their associated factors among Chinese early breast cancer in women under 35 years of age: A cross sectional study. Curr Probl Cancer 4: 100558. doi: 10.1016/j.currproblcancer.2020.100558
[22]	Mandelblatt JS, Zhai W, Ahn J, et al. (2020) Symptom burden among older breast cancer survivors: The Thinking and Living With Cancer (TLC) study. Cancer 126: 1183-1192. doi: 10.1002/cncr.32663
[23]	Champion VL, Wagner LI, Monahan PO, et al. (2014) Comparison of younger and older breast cancer survivors and age-matched controls on specific and overall quality of life domains. Cancer 120: 2237-2246. doi: 10.1002/cncr.28737
[24]	Sheppard VB, Harper FW, Davis K, et al. (2014) The importance of contextual factors and age in association with anxiety and depression in Black breast cancer patients. Psychooncology 23: 143-150. doi: 10.1002/pon.3382
[25]	Fried EI, Nesse RM, Zivin K, et al. (2014) Depression is more than the sum score of its parts: individual DSM symptoms have different risk factors. Psychol Med 44: 2067-2076. doi: 10.1017/S0033291713002900
[26]	Ten Have M, Lamers F, Wardenaar K, et al. (2016) The identification of symptom-based subtypes of depression: A nationally representative cohort study. J Affect Disord 190: 395-406. doi: 10.1016/j.jad.2015.10.040
[27]	Kroenke K, Spitzer RL, Williams JB (2001) The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 16: 606-613. doi: 10.1046/j.1525-1497.2001.016009606.x
[28]	Andersen BL, DeRubeis RJ, Berman BS, et al. (2014) Screening, assessment, and care of anxiety and depressive symptoms in adults with cancer: an American Society of Clinical Oncology guideline adaptation. J Clin Oncol 32: 1605-1619. doi: 10.1200/JCO.2013.52.4611
[29]	Nylund KL, Asparouhov T, Muthén BO (2007) Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study. Struct Equ Modeling 14: 535-569. doi: 10.1080/10705510701575396
[30]	Zhu L, Ranchor AV, van der Lee M, et al. (2016) Subtypes of depression in cancer patients: an empirically driven approach. Support Care Cancer 24: 1387-1396. doi: 10.1007/s00520-015-2919-y
[31]	Li J, Zhang H, Shao D, et al. (2019) Depressive Symptom Clusters and Their Relationships With Anxiety and Posttraumatic Stress Disorder Symptoms in Patients With Cancer: The Use of Latent Class Analysis. Cancer Nurs 42: 388-395. doi: 10.1097/NCC.0000000000000624
[32]	Hinz A, Mehnert A, Kocalevent RD, et al. (2016) Assessment of depression severity with the PHQ-9 in cancer patients and in the general population. BMC Psychiatry 16: 22. doi: 10.1186/s12888-016-0728-6
[33]	Kim JM, Jang JE, Stewart R, et al. (2013) Determinants of suicidal ideation in patients with breast cancer. Psychooncology 22: 2848-2856. doi: 10.1002/pon.3367
[34]	Dooley LN, Ganz PA, Cole SW, et al. (2016) Val66Met BDNF polymorphism as a vulnerability factor for inflammation-associated depressive symptoms in women with breast cancer. J Affect Disord 197: 43-50. doi: 10.1016/j.jad.2016.02.059
[35]	Carpenter JS, Igega CM, Otte JL, et al. (2014) Somatosensory amplification and menopausal symptoms in breast cancer survivors and midlife women. Maturitas 78: 51-55. doi: 10.1016/j.maturitas.2014.02.006
[36]	Howard-Anderson J, Ganz PA, Bower JE, et al. (2012) Quality of life, fertility concerns, and behavioral health outcomes in younger breast cancer survivors: a systematic review. J Natl Cancer Inst 104: 386-405. doi: 10.1093/jnci/djr541
[37]	Posner K, Brown GK, Stanley B, et al. (2011) The Columbia-Suicide Severity Rating Scale: initial validity and internal consistency findings from three multisite studies with adolescents and adults. Am J Psychiatry 168: 1266-1277. doi: 10.1176/appi.ajp.2011.10111704

This article has been cited by:

1.	Zhihua Chen, Xinke Wang, Peng Gao, Hongju Liu, Bosheng Song, Predicting Disease Related microRNA Based on Similarity and Topology, 2019, 8, 2073-4409, 1405, 10.3390/cells8111405
2.	Lei Zheng, Dongyang Liu, Wuritu Yang, Lei Yang, Yongchun Zuo, RaacLogo: a new sequence logo generator by using reduced amino acid clusters, 2020, 1467-5463, 10.1093/bib/bbaa096
3.	Leyi Wei, Wenjia He, Adeel Malik, Ran Su, Lizhen Cui, Balachandran Manavalan, Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework, 2020, 1467-5463, 10.1093/bib/bbaa275
4.	Chaolu Meng, Shunshan Jin, Lei Wang, Fei Guo, Quan Zou, AOPs-SVM: A Sequence-Based Classifier of Antioxidant Proteins Using a Support Vector Machine, 2019, 7, 2296-4185, 10.3389/fbioe.2019.00224
5.	Xing Gao, Guilin Li, A Cytokine Protein Identification Model Based on the Compressed PseKRAAC Features, 2020, 8, 2169-3536, 141422, 10.1109/ACCESS.2020.3013409
6.	Meijie Zhang, Luyang Cheng, Yina Zhang, Characterization of Dysregulated lncRNA-Associated ceRNA Network Reveals Novel lncRNAs With ceRNA Activity as Epigenetic Diagnostic Biomarkers for Osteoporosis Risk, 2020, 8, 2296-634X, 10.3389/fcell.2020.00184
7.	Yu-Miao Chen, Xin-Ping Zu, Dan Li, Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction, 2020, 11, 1664-8021, 10.3389/fgene.2020.569100
8.	Qingwen Li, Wenyang Zhou, Donghua Wang, Sui Wang, Qingyuan Li, Prediction of Anticancer Peptides Using a Low-Dimensional Feature Model, 2020, 8, 2296-4185, 10.3389/fbioe.2020.00892
9.	Xudong Zhao, Hanxu Wang, Hangyu Li, Yiming Wu, Guohua Wang, Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method, 2021, 12, 1664-462X, 10.3389/fpls.2021.506681
10.	Hasan Zulfiqar, Muhammad Shareef Masoud, Hui Yang, Shu-Guang Han, Cheng-Yan Wu, Hao Lin, Balachandran Manavalan, Screening of Prospective Plant Compounds as H1R and CL1R Inhibitors and Its Antiallergic Efficacy through Molecular Docking Approach, 2021, 2021, 1748-6718, 1, 10.1155/2021/6683407
11.	Wei Chen, Kewei Liu, Analysis and Comparison of RNA Pseudouridine Site Prediction Tools, 2020, 15, 15748936, 279, 10.2174/1574893614666191018171521
12.	Lei Xu, Guangmin Liang, Baowen Chen, Xu Tan, Huaikun Xiang, Changrui Liao, A Computational Method for the Identification of Endolysins and Autolysins, 2020, 27, 09298665, 329, 10.2174/0929866526666191002104735
13.	Fang Wang, Zheng-Xing Guan, Fu-Ying Dao, Hui Ding, A Brief Review of the Computational Identification of Antifreeze Protein, 2019, 23, 13852728, 1671, 10.2174/1385272823666190718145613
14.	Zhiyu Tao, Benzhi Dong, Zhixia Teng, Yuming Zhao, The Classification of Enzymes by Deep Learning, 2020, 8, 2169-3536, 89802, 10.1109/ACCESS.2020.2992468
15.	Meng-Lu Liu, Wei Su, Jia-Shu Wang, Yu-He Yang, Hui Yang, Hao Lin, Predicting Preference of Transcription Factors for Methylated DNA Using Sequence Information, 2020, 22, 21622531, 1043, 10.1016/j.omtn.2020.07.035
16.	Zi-Mei Zhang, Zheng-Xing Guan, Fang Wang, Dan Zhang, Hui Ding, Application of Machine Learning Methods in Predicting Nuclear Receptors and their Families, 2020, 16, 15734064, 594, 10.2174/1573406415666191004125551
17.	Kuan Li, Yue Zhong, Xuan Lin, Zhe Quan, Predicting the Disease Risk of Protein Mutation Sequences With Pre-training Model, 2020, 11, 1664-8021, 10.3389/fgene.2020.605620
18.	Chen-Chen Li, Bin Liu, MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks, 2020, 21, 1467-5463, 2133, 10.1093/bib/bbz133
19.	Lifu Zhang, Benzhi Dong, Zhixia Teng, Ying Zhang, Liran Juan, Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs, 2020, 2020, 2314-6133, 1, 10.1155/2020/9235920
20.	Chunyu Wang, Kai Sun, Juexin Wang, Maozu Guo, Data fusion-based algorithm for predicting miRNA–Disease associations, 2020, 88, 14769271, 107357, 10.1016/j.compbiolchem.2020.107357
21.	Ruirui Liang, Jiayang Xie, Chi Zhang, Mengying Zhang, Hai Huang, Haizhong Huo, Xin Cao, Bing Niu, Identifying Cancer Targets Based on Machine Learning Methods via Chou’s 5-steps Rule and General Pseudo Components, 2019, 19, 15680266, 2301, 10.2174/1568026619666191016155543
22.	Ting Liu, Hua Tang, A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite, 2020, 26, 13816128, 3049, 10.2174/1381612826666200310122324
23.	Zijie Sun, Shenghui Huang, Lei Zheng, Pengfei Liang, Wuritu Yang, Yongchun Zuo, ICTC-RAAC: An improved web predictor for identifying the types of ion channel-targeted conotoxins by using reduced amino acid cluster descriptors, 2020, 89, 14769271, 107371, 10.1016/j.compbiolchem.2020.107371
24.	Ying Wang, Juanjuan Kang, Ning Li, Yuwei Zhou, Zhongjie Tang, Bifang He, Jian Huang, NeuroCS: A Tool to Predict Cleavage Sites of Neuropeptide Precursors, 2020, 27, 09298665, 337, 10.2174/0929866526666191112150636
25.	Shanwen Sun, Hui Ding, Donghua Wang, Shuguang Han, Identifying Antifreeze Proteins Based on Key Evolutionary Information, 2020, 8, 2296-4185, 10.3389/fbioe.2020.00244
26.	Nguyen Quoc Khanh Le, Fertility-GRU: Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles, 2019, 18, 1535-3893, 3503, 10.1021/acs.jproteome.9b00411
27.	Zhourun Wu, Qing Liao, Bin Liu, idenPC-MIIP: identify protein complexes from weighted PPI networks using mutual important interacting partner relation, 2021, 22, 1477-4054, 1972, 10.1093/bib/bbaa016
28.	Zhe Liu, Yingli Gong, Yihang Bao, Yuanzhao Guo, Han Wang, Guan Ning Lin, TMPSS: A Deep Learning-Based Predictor for Secondary Structure and Topology Structure Prediction of Alpha-Helical Transmembrane Proteins, 2021, 8, 2296-4185, 10.3389/fbioe.2020.629937
29.	Hao Lv, Fu-Ying Dao, Zheng-Xing Guan, Dan Zhang, Jiu-Xin Tan, Yong Zhang, Wei Chen, Hao Lin, iDNA6mA-Rice: A Computational Tool for Detecting N6-Methyladenine Sites in Rice, 2019, 10, 1664-8021, 10.3389/fgene.2019.00793
30.	Ruiyan Hou, Jin Wu, Lei Xu, Quan Zou, Yi-Jun Wu, Computational Prediction of Protein Arginine Methylation Based on Composition–Transition–Distribution Features, 2020, 5, 2470-1343, 27470, 10.1021/acsomega.0c03972
31.	Xiao-Yang Jing, Feng-Min Li, Predicting Cell Wall Lytic Enzymes Using Combined Features, 2021, 8, 2296-4185, 10.3389/fbioe.2020.627335
32.	Hong-Fei Li, Xian-Fang Wang, Hua Tang, Predicting Bacteriophage Enzymes and Hydrolases by Using Combined Features, 2020, 8, 2296-4185, 10.3389/fbioe.2020.00183
33.	Wei Chen, Pengmian Feng, Xiaoming Song, Hao Lv, Hao Lin, iRNA-m7G: Identifying N7-methylguanosine Sites by Fusing Multiple Features, 2019, 18, 21622531, 269, 10.1016/j.omtn.2019.08.022
34.	Chao Wang, Pingping Wang, Shuguang Han, Lida Wang, Yuming Zhao, Liran Juan, FunEffector-Pred: Identification of Fungi Effector by Activate Learning and Genetic Algorithm Sampling of Imbalanced Data, 2020, 8, 2169-3536, 57674, 10.1109/ACCESS.2020.2982410
35.	Yanjuan Li, Zitong Zhang, Zhixia Teng, Xiaoyan Liu, Hui Ding, PredAmyl-MLP: Prediction of Amyloid Proteins Using Multilayer Perceptron, 2020, 2020, 1748-6718, 1, 10.1155/2020/8845133
36.	Wanben Zhong, Bineng Zhong, Hongbo Zhang, Ziyi Chen, Yan Chen, Identification of Anti-cancer Peptides Based on Multi-classifier System, 2020, 22, 13862073, 694, 10.2174/1386207322666191203141102
37.	Xiaoqing Ru, Peigang Cao, Lihong Li, Quan Zou, Selecting Essential MicroRNAs Using a Novel Voting Method, 2019, 18, 21622531, 16, 10.1016/j.omtn.2019.07.019
38.	Feifei Cui, Zilong Zhang, Quan Zou, Sequence representation approaches for sequence-based protein prediction tasks that use deep learning, 2021, 20, 2041-2649, 61, 10.1093/bfgp/elaa030
39.	Muhammad Arif, Farman Ali, Saeed Ahmad, Muhammad Kabir, Zakir Ali, Maqsood Hayat, Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination, 2020, 112, 08887543, 1565, 10.1016/j.ygeno.2019.09.006
40.	Yongxian Fan, Meijun Chen, Qingqi Zhu, lncLocPred: Predicting LncRNA Subcellular Localization Using Multiple Sequence Feature Information, 2020, 8, 2169-3536, 124702, 10.1109/ACCESS.2020.3007317
41.	Changgeng Tan, Tong Wang, Wenyi Yang, Lei Deng, PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction, 2019, 25, 1420-3049, 98, 10.3390/molecules25010098
42.	Chunyu Wang, Jialin Li, Ying Zhang, Maozu Guo, Identification of Type VI Effector Proteins Using a Novel Ensemble Classifier, 2020, 8, 2169-3536, 75085, 10.1109/ACCESS.2020.2985111
43.	Zhibin Lv, Hui Ding, Lei Wang, Quan Zou, A Convolutional Neural Network Using Dinucleotide One-hot Encoder for identifying DNA N6-Methyladenine Sites in the Rice Genome, 2021, 422, 09252312, 214, 10.1016/j.neucom.2020.09.056
44.	Xiao-Yang Jing, Feng-Min Li, Identifying Heat Shock Protein Families from Imbalanced Data by Using Combined Features, 2020, 2020, 1748-670X, 1, 10.1155/2020/8894478
45.	Hui Yang, Wuritu Yang, Fu-Ying Dao, Hao Lv, Hui Ding, Wei Chen, Hao Lin, A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae , 2020, 21, 1467-5463, 1568, 10.1093/bib/bbz123
46.	Balachandran Manavalan, Md. Mehedi Hasan, Shaherin Basith, Vijayakumar Gosu, Tae-Hwan Shin, Gwang Lee, Empirical Comparison and Analysis of Web-Based DNA N4-Methylcytosine Site Prediction Tools, 2020, 22, 21622531, 406, 10.1016/j.omtn.2020.09.010
47.	Zheng-Xing Guan, Shi-Hao Li, Zi-Mei Zhang, Dan Zhang, Hui Yang, Hui Ding, A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods, 2020, 21, 13892029, 11, 10.2174/1389202921666200214125102
48.	Balachandran Manavalan, Shaherin Basith, Tae Hwan Shin, Gwang Lee, Computational prediction of species-specific yeast DNA replication origin via iterative feature representation, 2020, 1467-5463, 10.1093/bib/bbaa304
49.	Xian-Fang Wang, Peng Gao, Yi-Feng Liu, Hong-Fei Li, Fan Lu, Predicting Thermophilic Proteins by Machine Learning, 2020, 15, 15748936, 493, 10.2174/1574893615666200207094357
50.	Chang Lu, Yingli Gong, Zhe Liu, Yuanzhao Guo, Zhiqiang Ma, Han Wang, TM-ZC: A Deep Learning-Based Predictor for the Z-Coordinate of Residues in α-Helical Transmembrane Proteins, 2020, 8, 2169-3536, 40129, 10.1109/ACCESS.2020.2976797
51.	Tianyi Zhao, Donghua Wang, Yang Hu, Ningyi Zhang, Tianyi Zang, Yadong Wang, Identifying Alzheimer’s Disease-related miRNA Based on Semi-clustering, 2019, 19, 15665232, 216, 10.2174/1566523219666190924113737
52.	Dan Zhang, Zhao-Chun Xu, Wei Su, Yu-He Yang, Hao Lv, Hui Yang, Hao Lin, Jinbo Xu, iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features, 2020, 1367-4803, 10.1093/bioinformatics/btaa702
53.	Dan Zhang, Zheng-Xing Guan, Zi-Mei Zhang, Shi-Hao Li, Fu-Ying Dao, Hua Tang, Hao Lin, Recent Development of Computational Predicting Bioluminescent Proteins, 2020, 25, 13816128, 4264, 10.2174/1381612825666191107100758
54.	Yanwen Li, Feng Pu, Yu Feng, Jinchao Ji, Hongguang Sun, Han Wang, MRMD-palm: A novel method for the identification of palmitoylated protein, 2021, 210, 01697439, 104245, 10.1016/j.chemolab.2021.104245
55.	Feng-Min Li, Xiao-Wei Gao, Predicting Gram-Positive Bacterial Protein Subcellular Location by Using Combined Features, 2020, 2020, 2314-6133, 1, 10.1155/2020/9701734
56.	Zhiyu Tao, Yanjuan Li, Zhixia Teng, Yuming Zhao, Hui Ding, A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD, 2020, 2020, 1748-6718, 1, 10.1155/2020/8926750
57.	Duyen Thi Do, Nguyen Quoc Khanh Le, A sequence-based approach for identifying recombination spots in Saccharomyces cerevisiae by using hyper-parameter optimization in FastText and support vector machine, 2019, 194, 01697439, 103855, 10.1016/j.chemolab.2019.103855
58.	Chunyu Wang, Jialin Li, Xiaoyan Liu, Maozu Guo, Predicting Sub-Golgi Apparatus Resident Protein With Primary Sequence Hybrid Features, 2020, 8, 2169-3536, 4442, 10.1109/ACCESS.2019.2962821
59.	Qingwen Li, Benzhi Dong, Donghua Wang, Sui Wang, Identification of Secreted Proteins From Malaria Protozoa With Few Features, 2020, 8, 2169-3536, 89793, 10.1109/ACCESS.2020.2994206
60.	Pengmian Feng, Weiwei Liu, Cong Huang, Zhaohui Tang, Classifying the superfamily of small heat shock proteins by using g-gap dipeptide compositions, 2021, 167, 01418130, 1575, 10.1016/j.ijbiomac.2020.11.111
61.	Hongfei Li, Haoze Du, Xianfang Wang, Peng Gao, Yifeng Liu, Weizhong Lin, Remarks on Computational Method for Identifying Acid and Alkaline Enzymes, 2020, 26, 13816128, 3105, 10.2174/1381612826666200617170826
62.	Shahid Akbar, Salman Khan, Farman Ali, Maqsood Hayat, Muhammad Qasim, Sarah Gul, iHBP-DeepPSSM: Identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach, 2020, 204, 01697439, 104103, 10.1016/j.chemolab.2020.104103
63.	He Zhuang, Ying Zhang, Shuo Yang, Liang Cheng, Shu-Lin Liu, A Mendelian Randomization Study on Infant Length and Type 2 Diabetes Mellitus Risk, 2019, 19, 15665232, 224, 10.2174/1566523219666190925115535
64.	Xin Gao, Donghua Wang, Jun Zhang, Qing Liao, Bin Liu, iRBP-Motif-PSSM: Identification of RNA-Binding Proteins Based on Collaborative Learning, 2019, 7, 2169-3536, 168956, 10.1109/ACCESS.2019.2952621
65.	Zihao Yan, Dong Chen, Zhixia Teng, Donghua Wang, Yanjuan Li, SMOPredT4SE: An Effective Prediction of Bacterial Type IV Secreted Effectors Using SVM Training With SMO, 2020, 8, 2169-3536, 25570, 10.1109/ACCESS.2020.2971091
66.	Dan Zhang, Hua-Dong Chen, Hasan Zulfiqar, Shi-Shi Yuan, Qin-Lai Huang, Zhao-Yue Zhang, Ke-Jun Deng, Watshara Shoombuatong, iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins, 2021, 2021, 1748-6718, 1, 10.1155/2021/6664362
67.	Zi-Mei Zhang, Jiu-Xin Tan, Fang Wang, Fu-Ying Dao, Zhao-Yue Zhang, Hao Lin, Early Diagnosis of Hepatocellular Carcinoma Using Machine Learning Method, 2020, 8, 2296-4185, 10.3389/fbioe.2020.00254
68.	Zifan Guo, Pingping Wang, Zhendong Liu, Yuming Zhao, Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction, 2020, 8, 2296-4185, 10.3389/fbioe.2020.584807
69.	Rajiv G. Govindaraj, Sathiyamoorthy Subramaniyam, Balachandran Manavalan, Extremely-randomized-tree-based Prediction of N6-methyladenosine Sites inSaccharomyces cerevisiae, 2020, 21, 13892029, 26, 10.2174/1389202921666200219125625
70.	Sola Gbenro, Kyle Hippe, Renzhi Cao, 2020, HMMeta, 9781450379649, 1, 10.1145/3388440.3414702
71.	Yixiao Zhai, Yu Chen, Zhixia Teng, Yuming Zhao, Identifying Antioxidant Proteins by Using Amino Acid Composition and Protein-Protein Interactions, 2020, 8, 2296-634X, 10.3389/fcell.2020.591487
72.	Xianhai Li, Qiang Tang, Hua Tang, Wei Chen, Identifying Antioxidant Proteins by Combining Multiple Methods, 2020, 8, 2296-4185, 10.3389/fbioe.2020.00858
73.	Bing Rao, Lichao Zhang, Guoying Zhang, ACP-GCN: The Identification of Anticancer Peptides Based on Graph Convolution Networks, 2020, 8, 2169-3536, 176005, 10.1109/ACCESS.2020.3023800
74.	Ni Kou, Wenyang Zhou, Yuzhu He, Xiaoxia Ying, Songling Chai, Tao Fei, Wenqi Fu, Jiaqian Huang, Huiying Liu, A Mendelian Randomization Analysis to Expose the Causal Effect of IL-18 on Osteoporosis Based on Genome-Wide Association Study Data, 2020, 8, 2296-4185, 10.3389/fbioe.2020.00201
75.	Peng Xu, Geng Zhao, Zheng Kou, Gang Fang, Wenbin Liu, Classification of Cancers Based on a Comprehensive Pathway Activity Inferred by Genes and Their Interactions, 2020, 8, 2169-3536, 30515, 10.1109/ACCESS.2020.2973220
76.	Shi-Hao Li, Zheng-Xing Guan, Dan Zhang, Zi-Mei Zhang, Jian Huang, Wuritu Yang, Hao Lin, Recent Advancement in Predicting Subcellular Localization of Mycobacterial Protein with Machine Learning Methods, 2020, 16, 15734064, 605, 10.2174/1573406415666191004101913
77.	Han Luo, Donghua Wang, Juan Liu, Ying Ju, Zhe Jin, A Framework Integrating Heterogeneous Databases for the Completion of Gene Networks, 2019, 7, 2169-3536, 168859, 10.1109/ACCESS.2019.2954994
78.	Zhibin Lv, Feifei Cui, Quan Zou, Lichao Zhang, Lei Xu, Anticancer peptides prediction with deep representation learning features, 2021, 1467-5463, 10.1093/bib/bbab008
79.	Chang Lu, Zhe Liu, Bowen Kan, Yingli Gong, Zhiqiang Ma, Han Wang, TMP-SSurface: A Deep Learning-Based Predictor for Surface Accessibility of Transmembrane Protein Residues, 2019, 9, 2073-4352, 640, 10.3390/cryst9120640
80.	Yideng Cai, Jiacheng Wang, Lei Deng, SDN2GO: An Integrated Deep Learning Model for Protein Function Prediction, 2020, 8, 2296-4185, 10.3389/fbioe.2020.00391
81.	Hao Wang, Qilemuge Xi, Pengfei Liang, Lei Zheng, Yan Hong, Yongchun Zuo, IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy, 2021, 53, 0939-4451, 239, 10.1007/s00726-021-02941-9
82.	Shanwen Sun, Chunyu Wang, Hui Ding, Quan Zou, Machine learning and its applications in plant molecular studies, 2020, 19, 2041-2657, 40, 10.1093/bfgp/elz036
83.	Yichen Guo, Ke Yan, Hao Wu, Bin Liu, ReFold-MAP: Protein remote homology detection and fold recognition based on features extracted from profiles, 2020, 611, 00032697, 114013, 10.1016/j.ab.2020.114013
84.	Guilin Li, Xing Gao, The Feature Compression Algorithms for Identifying Cytokines Based on CNT Features, 2020, 8, 2169-3536, 83645, 10.1109/ACCESS.2020.2989749
85.	Qilemuge Xi, Hao Wang, Liuxi Yi, Jian Zhou, Yuchao Liang, Xiaoqing Zhao, Yongchun Zuo, Lei Chen, ANPrAod: Identify Antioxidant Proteins by Fusing Amino Acid Clustering Strategy and N -Peptide Combination, 2021, 2021, 1748-6718, 1, 10.1155/2021/5518209
86.	Lesong Wei, Xiucai Ye, Yuyang Xue, Tetsuya Sakurai, Leyi Wei, ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism, 2021, 1467-5463, 10.1093/bib/bbab041
87.	Hasan Zulfiqar, Rida Sarwar Khan, Farwa Hassan, Kyle Hippe, Cassandra Hunt, Hui Ding, Xiao-Ming Song, Renzhi Cao, Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method, 2021, 18, 1551-0018, 3348, 10.3934/mbe.2021167
88.	Yanwen Li, Feng Pu, Jingru Wang, Zhiguo Zhou, Chunhua Zhang, Fei He, Zhiqiang Ma, Jingbo Zhang, Machine Learning Methods in Prediction of Protein Palmitoylation Sites: A Brief Review, 2021, 27, 13816128, 2189, 10.2174/1381612826666201112142826
89.	Shulin Zhao, Ying Ju, Xiucai Ye, Jun Zhang, Shuguang Han, Bioluminescent Proteins Prediction with Voting Strategy, 2021, 16, 15748936, 240, 10.2174/1574893615999200601122328
90.	Yinuo Lyu, Zhen Zhang, Jiawei Li, Wenying He, Yijie Ding, Fei Guo, iEnhancer-KL: A Novel Two-Layer Predictor for Identifying Enhancers by Position Specific of Nucleotide Composition, 2021, 18, 1545-5963, 2809, 10.1109/TCBB.2021.3053608
91.	Hongdi Pei, Jiayu Li, Shuhan Ma, Jici Jiang, Mingxin Li, Quan Zou, Zhibin Lv, Identification of Thermophilic Proteins Based on Sequence-Based Bidirectional Representations from Transformer-Embedding Features, 2023, 13, 2076-3417, 2858, 10.3390/app13052858
92.	Hasan Zulfiqar, Qin-Lai Huang, Hao Lv, Zi-Jie Sun, Fu-Ying Dao, Hao Lin, Deep-4mCGP: A Deep Learning Approach to Predict 4mC Sites in Geobacter pickeringii by Using Correlation-Based Feature Selection Technique, 2022, 23, 1422-0067, 1251, 10.3390/ijms23031251
93.	Ting Liu, Jiamao Chen, Qian Zhang, Kyle Hippe, Cassandra Hunt, Thu Le, Renzhi Cao, Hua Tang, The Development of Machine Learning Methods in Discriminating Secretory Proteins of Malaria Parasite, 2022, 29, 09298673, 807, 10.2174/0929867328666211005140625
94.	Jing Guo, TCN-HBP: A Deep Learning Method for Identifying Hormone-Binding Proteins from Amino Acid Sequences Based on a Temporal Convolution Neural Network, 2021, 2025, 1742-6588, 012002, 10.1088/1742-6596/2025/1/012002
95.	Chaolu Meng, Jin Wu, Fei Guo, Benzhi Dong, Lei Xu, CWLy-pred: A novel cell wall lytic enzyme identifier based on an improved MRMD feature selection method, 2020, 112, 08887543, 4715, 10.1016/j.ygeno.2020.08.015
96.	Zhixia Teng, Zitong Zhang, Zhen Tian, Yanjuan Li, Guohua Wang, ReRF-Pred: predicting amyloidogenic regions of proteins based on their pseudo amino acid composition and tripeptide composition, 2021, 22, 1471-2105, 10.1186/s12859-021-04446-4
97.	Peiran Jiang, Wanshan Ning, Yunshu Shi, Chuan Liu, Saijun Mo, Haoran Zhou, Kangdong Liu, Yaping Guo, FSL-Kla: A few-shot learning-based multi-feature hybrid system for lactylation site prediction, 2021, 19, 20010370, 4497, 10.1016/j.csbj.2021.08.013
98.	Mujiexin Liu, Hui Chen, Dong Gao, Cai-Yi Ma, Zhao-Yue Zhang, Balachandran Manavalan, Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features, 2022, 2022, 1748-6718, 1, 10.1155/2022/7493834
99.	Yi-Wei Zhao, Shihua Zhang, Hui Ding, Recent Development of Machine Learning Methods in Sumoylation Sites Prediction, 2022, 29, 09298673, 894, 10.2174/0929867328666210915112030
100.	Juexin Wang, Yan Wang, Towards Machine Learning in Molecular Biology, 2020, 17, 1551-0018, 2822, 10.3934/mbe.2020156
101.	Yuning Yang, Jiawen Yu, Zhe Liu, Xi Wang, Han Wang, Zhiqiang Ma, Dong Xu, An Improved Topology Prediction of Alpha-Helical Transmembrane Protein Based on Deep Multi-Scale Convolutional Neural Network, 2022, 19, 1545-5963, 295, 10.1109/TCBB.2020.3005813
102.	Shihu Jiao, Zheng Chen, Lichao Zhang, Xun Zhou, Lei Shi, ATGPred-FL: sequence-based prediction of autophagy proteins with feature representation learning, 2022, 54, 0939-4451, 799, 10.1007/s00726-022-03145-5
103.	Changli Feng, Haiyan Wei, Deyun Yang, Bin Feng, Zhaogui Ma, Shuguang Han, Quan Zou, Hua Shi, ORS‐Pred: An optimized reduced scheme‐based identifier for antioxidant proteins, 2021, 21, 1615-9853, 2100017, 10.1002/pmic.202100017
104.	Hongliang Zou, Zhijian Yin, m7G-DPP: Identifying N7-methylguanosine sites based on dinucleotide physicochemical properties of RNA, 2021, 279, 03014622, 106697, 10.1016/j.bpc.2021.106697
105.	Yuxin Gong, Bo Liao, Dejun Peng, Quan Zou, Accurate Prediction and Key Feature Recognition of Immunoglobulin, 2021, 11, 2076-3417, 6894, 10.3390/app11156894
106.	Yuxin Guo, Liping Hou, Wen Zhu, Peng Wang, Prediction of Hormone-Binding Proteins Based on K-mer Feature Representation and Naive Bayes, 2021, 12, 1664-8021, 10.3389/fgene.2021.797641
107.	Lesong Wei, Xiucai Ye, Tetsuya Sakurai, Zengchao Mu, Leyi Wei, Pier Luigi Martelli, ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning, 2022, 38, 1367-4803, 1514, 10.1093/bioinformatics/btac006
108.	Yihang Bao, Weixi Wang, Minglong Dong, Fei He, Han Wang, 2021, Discover the Binding Domain of Transmembrane Proteins Based on Structural Universality, 978-1-6654-0126-5, 5, 10.1109/BIBM52615.2021.9669493
109.	Wen Zhu, Yuxin Guo, Quan Zou, Prediction of presynaptic and postsynaptic neurotoxins based on feature extraction, 2021, 18, 1551-0018, 5943, 10.3934/mbe.2021297
110.	Hasan Zulfiqar, Shi-Shi Yuan, Qin-Lai Huang, Zi-Jie Sun, Fu-Ying Dao, Xiao-Long Yu, Hao Lin, Identification of cyclin protein using gradient boost decision tree algorithm, 2021, 19, 20010370, 4123, 10.1016/j.csbj.2021.07.013
111.	Jun Zhang, Qingcai Chen, Bin Liu, DeepDRBP-2L: A New Genome Annotation Predictor for Identifying DNA-Binding Proteins and RNA-Binding Proteins Using Convolutional Neural Network and Long Short-Term Memory, 2021, 18, 1545-5963, 1451, 10.1109/TCBB.2019.2952338
112.	Pengmian Feng, Lijing Feng, Recent Advances on Antioxidant Identification Based on Machine Learning Methods, 2020, 21, 13892002, 804, 10.2174/1389200221666200719001449
113.	Xiaoping Min, Fengqing Lu, Chunyan Li, Sequence-Based Deep Learning Frameworks on Enhancer-Promoter Interactions Prediction, 2021, 27, 13816128, 1847, 10.2174/1381612826666201124112710
114.	Yu-Hao Wang, Yu-Fei Zhang, Ying Zhang, Zhi-Feng Gu, Zhao-Yue Zhang, Hao Lin, Ke-Jun Deng, Identification of adaptor proteins using the ANOVA feature selection technique, 2022, 208, 10462023, 42, 10.1016/j.ymeth.2022.10.008
115.	Shihu Jiao, Quan Zou, Huannan Guo, Lei Shi, iTTCA-RF: a random forest predictor for tumor T cell antigens, 2021, 19, 1479-5876, 10.1186/s12967-021-03084-x
116.	Bosheng Song, Zimeng Li, Xuan Lin, Jianmin Wang, Tian Wang, Xiangzheng Fu, Pretraining model for biological sequence data, 2021, 20, 2041-2649, 181, 10.1093/bfgp/elab025
117.	Peijie Zheng, Yue Qi, Xueyong Li, Yuewu Liu, Yuhua Yao, Guohua Huang, A capsule network-based method for identifying transcription factors, 2022, 13, 1664-302X, 10.3389/fmicb.2022.1048478
118.	Hongliang Zou, iAHTP-LH: Integrating Low-Order and High-Order Correlation Information for Identifying Antihypertensive Peptides, 2022, 28, 1573-3904, 10.1007/s10989-022-10414-0
119.	Hui Zhang, Qin Chen, Bing Niu, Risk Assessment of Veterinary Drug Residues in Meat Products, 2020, 21, 13892002, 779, 10.2174/1389200221999200820164650
120.	Ke Yan, Jie Wen, Jin-Xing Liu, Yong Xu, Bin Liu, Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores, 2021, 18, 1545-5963, 2008, 10.1109/TCBB.2020.2966450
121.	Hongliang Zou, Fan Yang, Zhijian Yin, Identification of tumor homing peptides by utilizing hybrid feature representation, 2022, 0739-1102, 1, 10.1080/07391102.2022.2049368
122.	Hasan Zulfiqar, Zhiling Guo, Bakanina Kissanga Grace-Mercure, Zhao-Yue Zhang, Hui Gao, Hao Lin, Yun Wu, Empirical Comparison and Recent Advances of Computational Prediction of Hormone Binding Proteins Using Machine Learning Methods, 2023, 20010370, 10.1016/j.csbj.2023.03.024
123.	Hongliang Zou, iHBPs-VWDC: variable-length window-based dynamic connectivity approach for identifying hormone-binding proteins, 2023, 0739-1102, 1, 10.1080/07391102.2023.2283150
124.	Ali Ghulam, Zar Nawab Khan Swati, Farman Ali, Saima Tunio, Nida Jabeen, Natasha Iqbal, DeepImmuno-PSSM: Identification of Immunoglobulin based on Deep learning and PSSM-Profiles, 2023, 11, 2308-8168, 54, 10.21015/vtcs.v11i1.1396
125.	A. Sherly Alphonse, N. Ani Brown Mary, Classification of anti-oxidant proteins using novel physiochemical and conjoint-quad (PCQ) feature composition, 2023, 83, 1573-7721, 48831, 10.1007/s11042-023-17498-w
126.	Zhi-Feng Gu, Yu-Duo Hao, Tian-Yu Wang, Pei-Ling Cai, Yang Zhang, Ke-Jun Deng, Hao Lin, Hao Lv, Prediction of blood–brain barrier penetrating peptides based on data augmentation with Augur, 2024, 22, 1741-7007, 10.1186/s12915-024-01883-4
127.	Zhibin Lv, Mingxuan Wei, Hongdi Pei, Shiyu Peng, Mingxin Li, Liangzhen Jiang, PTSP-BERT: Predict the thermal stability of proteins using sequence-based bidirectional representations from transformer-embedded features, 2025, 185, 00104825, 109598, 10.1016/j.compbiomed.2024.109598
128.	Ali Ghulam, Muhammad Arif, Ahsanullah Unar, Maha A. Thafar, Somayah Albaradei, Apilak Worachartcheewan, StackAHTPs: An explainable antihypertensive peptides identifier based on heterogeneous features and stacked learning approach, 2025, 19, 1751-8849, 10.1049/syb2.70002
129.	Rui Li, Junwen Yu, Dongxin Ye, Shanghua Liu, Hongqi Zhang, Hao Lin, Juan Feng, Kejun Deng, Conotoxins: Classification, Prediction, and Future Directions in Bioinformatics, 2025, 17, 2072-6651, 78, 10.3390/toxins17020078
130.	Firuz Kamalov, Hana Sulieman, Ayman Alzaatreh, Maher Emarly, Hasna Chamlal, Murodbek Safaraliev, Mathematical Methods in Feature Selection: A Review, 2025, 13, 2227-7390, 996, 10.3390/math13060996

Reader Comments

Your name:*

Email:*
© 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Public Health

3.1 4.8

Metrics

Article views(3059) PDF downloads(71) Cited by(4)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

AIMS Public Health

Characteristics and subtypes of depressive symptoms in Chinese female breast cancer patients of different ages: a cross-sectional study