Nexus between crude oil prices, clean energy investments, technology companies and energy democracy

Caner Özdurak; Caner Özdurak

doi:10.3934/GF.2021017

Green Finance

2021, Volume 3, Issue 3: 337-350. doi: 10.3934/GF.2021017

Previous Article Next Article

Research article

Nexus between crude oil prices, clean energy investments, technology companies and energy democracy

Caner Özdurak ^,

Financial Economics Department, Yeditepe University, İstanbul, Turkey

In this study, we examine the nexus between crude oil prices, clean energy investments, technology companies, and energy democracy. Our dataset incorporates four variables which are S & P Global Clean Energy Index (SPClean), Brent crude oil futures (Brent), CBOE Volatility Index (VIX), and NASDAQ 100 Technology Sector (DXNT) daily prices between 2009 and 2021. The novelty of our study is that we included technology development and market fear as important factors and assess their impact on clean energy investments. DCC-GARCH models are utilized to analyze the spillover impact of market fear, oil prices, and technology company stock returns to clean energy investments. According to our findings when oil prices decrease, the volatility index usually responds by increasing which means that the market is afraid of oil price surges. Renewable investments also tend to decrease in that period following the oil price trend. Moreover, a positive relationship between technology stocks and renewable energy stock returns also exists.

Keywords:

Citation: Caner Özdurak. Nexus between crude oil prices, clean energy investments, technology companies and energy democracy[J]. Green Finance, 2021, 3(3): 337-350. doi: 10.3934/GF.2021017

Related Papers:

[1]	Lei Chen, Ruyun Qu, Xintong Liu . Improved multi-label classifiers for predicting protein subcellular localization. Mathematical Biosciences and Engineering, 2024, 21(1): 214-236. doi: 10.3934/mbe.2024010
[2]	Hangle Hu, Chunlei Cheng, Qing Ye, Lin Peng, Youzhi Shen . Enhancing traditional Chinese medicine diagnostics: Integrating ontological knowledge for multi-label symptom entity classification. Mathematical Biosciences and Engineering, 2024, 21(1): 369-391. doi: 10.3934/mbe.2024017
[3]	Jiyun Shen, Yiyi Xia, Yiming Lu, Weizhong Lu, Meiling Qian, Hongjie Wu, Qiming Fu, Jing Chen . Identification of membrane protein types via deep residual hypergraph neural network. Mathematical Biosciences and Engineering, 2023, 20(11): 20188-20212. doi: 10.3934/mbe.2023894
[4]	Yanping Xie, Zhaohui Dong, Junhua Du, Xiaoliang Zang, Huihui Guo, Min Liu, Shengwen Shao . The relationship between mouse lung adenocarcinoma at different stages and the expression level of exosomes in serum. Mathematical Biosciences and Engineering, 2020, 17(2): 1548-1557. doi: 10.3934/mbe.2020080
[5]	Cicely K. Macnamara, Mark A. J. Chaplain . Spatio-temporal models of synthetic genetic oscillators. Mathematical Biosciences and Engineering, 2017, 14(1): 249-262. doi: 10.3934/mbe.2017016
[6]	Hacı İsmail Aslan, Hoon Ko, Chang Choi . Classification of vertices on social networks by multiple approaches. Mathematical Biosciences and Engineering, 2022, 19(12): 12146-12159. doi: 10.3934/mbe.2022565
[7]	Yongyin Han, Maolin Liu, Zhixiao Wang . Key protein identification by integrating protein complex information and multi-biological features. Mathematical Biosciences and Engineering, 2023, 20(10): 18191-18206. doi: 10.3934/mbe.2023808
[8]	Wenjun Xu, Zihao Zhao, Hongwei Zhang, Minglei Hu, Ning Yang, Hui Wang, Chao Wang, Jun Jiao, Lichuan Gu . Deep neural learning based protein function prediction. Mathematical Biosciences and Engineering, 2022, 19(3): 2471-2488. doi: 10.3934/mbe.2022114
[9]	Feng Wang, Xiaochen Feng, Ren Kong, Shan Chang . Generating new protein sequences by using dense network and attention mechanism. Mathematical Biosciences and Engineering, 2023, 20(2): 4178-4197. doi: 10.3934/mbe.2023195
[10]	Kunli Zhang, Shuai Zhang, Yu Song, Linkun Cai, Bin Hu . Double decoupled network for imbalanced obstetric intelligent diagnosis. Mathematical Biosciences and Engineering, 2022, 19(10): 10006-10021. doi: 10.3934/mbe.2022467

Abstract

1. Introduction

Protein is a major component for almost all living creatures. It is highly related to the maintenance of normal physical functions in cells ^[1]. Several complicated and essential biological processes need proteins to participate in, such as cell proliferation ^[2], DNA replication ^[3], enzyme-mediated metabolic processes ^[4], etc. Furthermore, protein provides important contributions to construct basic cellular structure, maintain cellular microenvironment and form complex macrostructures. Thus, the research of protein-related problems is quite hot in recent years. Determination of the functions of proteins is one of the essential problems. Experimental determination is a solid method. However, it also has some evident shortcomings, such as high cost and low efficiency. Thus, it is of great urgency to design novel methods with low cost and high efficiency.

In recent years, several computational methods have been designed to identify protein functions. Most of them are data-driven methods. Based on lots of proteins with annotated functions, which can be obtained from some public databases, models were set up by using some existing or newly designed computer algorithms. The basic computational method to identify protein functions is based on protein sequence similarity measured by BLAST ^[5]. Other methods, such as sequence motif based methods (PROSITE) ^[6], profile-based methods (PFAM) ^[7], structure-based methods (FATCAT and ProCAT) ^[8], were also proposed to identify protein functions. In recent years, network-based methods become more and more popular to tackle some protein-related problems. Two previous studies employed protein network information to design hybrid approaches for the identification of protein functions. The method that adopted protein network information is an important step to identify protein functions ^[9,10]. Other steps used methods based on protein sequence similarity or biochemical and physicochemical description of proteins. Most established methods always focused on proteins, analyzing their sequences, properties, etc. Few studies considered function labels. As inspired by some studies on drug-related problems ^[11,12], which considered label information and improved the performance of classifiers, the associations of function labels may also be important information for protein function identification.

In this study, we constructed a multi-label classifier with a label space partition to identify protein functions. To conduct this investigation, we selected proteins of mouse, one of the most extensively studied organisms, as the research object. Proteins and their function annotations were retrieved from MfunGD ^[13]. 24 functional types were reported in such database. A label space partition method, incorporating Louvain method ^[14], was applied to analyze the associations of 24 functional types, resulting in some subsets of types. To prove such partition can improve the performance of classifiers, we set up several classifiers with RAndom k-labELsets (RAKEL) ^[15], with support vector machine (SVM) ^[16] or random forest (RF) ^[17] as the base classifier. On each type subset, a multi-label classifier was set up and they were integrated in the proposed classifiers. The results indicated that classifiers with a label space partition were always superior to those without considering the partition of functional types. Furthermore, these classifiers also provided better performance than those with a random partition of functional types.

2. Materials and methods

2.1. Datasets

We sourced the mouse proteins and their functional types from one previous study ^[9]. This information was retrieved from MfunGD (http://mips.gsf.de/genre/proj/mfungd/) ^[13], a public database collecting annotated mouse proteins and their occurrence in protein networks. In such database, mouse proteins were classified into 24 types, which are illustrated in Figure 1. The types of each mouse protein were determined by manually checking its annotation in the literature and GO annotation ^[18,19]. Because we encoded mouse proteins according to their functional domain or interaction information, those without these two types of information were excluded. Finally, a dataset consisting of 9655 mouse proteins were constructed. These proteins were also classified into above mentioned 24 functional types. The number of proteins in each functional type is also shown in Figure 1. It was easy to obtain that the sum of protein numbers in all 24 types were 29850, which was much larger than the number of different mouse proteins (9655). This fact implied that several proteins belonged to two or more functional types. Determination of functional types of mouse proteins was evidently a multi-label classification problem if functional types were deemed as labels.

Figure 1. Pie chart to show the number of mouse proteins in each functional type.

DownLoad: Full-Size Img PowerPoint

2.2. Label space partition

As mentioned above, mouse proteins in MfunGD were classified into 24 functional types and assigning these types to given proteins was a multi-label classification problem, where types were termed as labels. Due to the number of labels, it was difficult to directly build powerful multi-label classifiers. The partition of label set may be helpful to optimize classifiers as inspired by some studies on drug-related problems ^[11,12]. Thus, this section proposed a label space partition method to divide labels into some label subsets.

To implement this method, a label network was constructed first. Given a training dataset D with h labels (h = 24 in this study), denoted by ${l}_{1}, {l}_{2}, \dots, {l}_{h}$ , the label set for one sample s was defined as L(s). For each label ${l}_{i}(1\le i\le h)$ , samples having such label constituted a sample subset, denoted as ${SL(l}_{i})$ , that is

${SL(l}_{i}) = \left\{s:s\in D \ and \ {l}_{i}\in L\left(s\right)\right\}$

(1)

The label network defined labels as nodes and two nodes were connected by an edge if and only if their corresponding labels, say ${l}_{i}$ and ${l}_{j}$ , had common samples, that is ${SL(l}_{i})\cap {SL(l}_{j})\ne \varnothing$ . Furthermore, a weight was assigned to each edge for indicating the different association strength of labels. For an edge e, its weight was defined by

$w\left(e\right) = \left|{SL(l}_{i})\cap {SL(l}_{j})\right|$

(2)

where ${l}_{i}$ and ${l}_{j}$ were the endpoints of edge e. For an easy description, let us denoted such label network by N_L.

The Louvain method ^[14], a community detection algorithm, was performed on the label network N_L to classify labels into some subsets. Such method adopts a greedy aggregation scheme to detect communities such that nodes in each detected community have strong associations. Initially, each node in the network constitutes a community. A loop procedure is executed. In each round, two communities are selected and merged when such merging can provide highest contribution to modularity. For a node n and community C, the gain in modularity, denoted by $\varDelta Q$ , by merging n and C is defined as

$\varDelta Q = \left[\frac{{\Sigma }_{in}+{k}_{n, in}}{2m}-(\frac{{\Sigma }_{tot}+{k}_{n}}{2m}{)}^{2}\right]-[\frac{{\Sigma }_{in}}{2m}-(\frac{{\Sigma }_{tot}}{2m}{)}^{2}-\left(\frac{{k}_{n}}{2m}{)}^{2}\right]$

(3)

where ${\Sigma }_{in}$ stands for the overall weights of edges inside C, ${\Sigma }_{tot}$ stands for the overall weights of edges adjacent to nodes in C, ${k}_{n, in}$ represents the overall weights of edges connecting n and nodes in C, ${k}_{n}$ denotes the overall weights of edges adjacent to n, m is the overall weights of edges in the network. For each node n, the gain in modularity by merging it and each of its neighbor is computed. The merging producing the highest gain in modularity is selected and a new network is constructed. In details, if such merging involves node n and community C, the new network combines n and community C, producing a new node ${n}^{'}$ . The weight of an edge connecting ${n}^{'}$ and another node ${n}^{''}$ in the network is updated as the overall weights of edges connecting n (C) and ${n}^{''}$ . In the next round, above procedure is executed on the new network. The loop stops until the gain in modularity cannot be positive. The remaining communities in the network indicate a label partition.

In this study, the Louvain method was performed on the label network N_L. By refining its outcome, we can access a label partition. Let us denote the label partition as ${L}_{1}, {L}_{2}, \dots, {L}_{t}$ .

2.3. Feature engineering

Efficient classifiers always adopt informative features of samples, which contain essential properties of samples as much as possible. This study employed two schemes to encode each mouse protein. The first scheme extracted features derived from functional domain information of proteins through a natural language processing approach, whereas the second one generated features from several protein-protein interaction (PPI) networks. Their descriptions were as below.

2.3.1. Domain embedding features

Functional domain information is deemed to be useful to investigate various protein-related problems ^{[20,21,22,23,24]}. Here, we also adopted such information to encode each mouse protein.

We retrieved the functional domain information of all mouse proteins from InterPro database (http://www.ebi.ac.uk/interpro/, accessed in October 2020) ^[25]. This information contained 48739 mouse proteins, covering 16797 domains. Each domain was termed as words, whereas mouse proteins, annotated by domains, were deemed as sentences. Then, such above information was fed into the well-known natural language processing approach, word2vec ^[26,27], to learn embedding features of domains. As a result, each domain was encoded by a 256-D feature vector. Here, the word2vec program retrieved from https://github.com/RaRe-Technologies/gensim was adopted. It was executed with its default parameters.

The feature vectors of domains were further refined to represent each mouse protein. For each mouse protein, it was encoded by a vector, which was defined as the average of vectors of domains that were annotated on such protein. Thus, each protein was also represented by 256 features. For convenience, such obtained features were called domain embedding features.

2.3.2. Network embedding features

Network has been deemed to be a popular research form because it can organize objects at a system level. However, a gap exists between network and traditional machine learning algorithms. This gap promotes the process of network embedding algorithms, which can abstract linkage in one or more networks and learn features for each node in the network(s). In recent years, several network embedding algorithms, such as DeepWalk ^[28], Node2vec ^[29], and Mashup ^[30], etc. have been proposed. Some of them have been applied to tackle different protein-related problems ^{[30,31,32,33,34]}. Features obtained by network embedding algorithms are quite different from those extracted from inherent properties of samples and can reflect different aspects of samples. Here, we adopted Mashup to extract features of mouse proteins from several PPI networks.

We used the mouse PPI information collected in STRING (https://www.string-db.org/, Version 10.0) ^[35], a public database containing interaction of 9, 643, 763 proteins from 2031 organisms. Interactions in this database are derived from five main sources: Genomic context predictions, High-throughput lab experiments, (Conserved) Co-expression, Automated textmining, Previous knowledge in databases. Accordingly, they can widely evaluate the associations of proteins. The mouse PPI information involves 20648 mouse proteins and 5, 109, 107 interactions. Each interaction is assigned eight scores, where the first seven scores measure the association of proteins from some aspect of proteins and they are integrated in the last score. For each of first seven scores, a PPI network was constructed, where proteins were defined as nodes and two nodes were connected by an edge when their corresponding proteins can constitute a PPI with such score larger than zero. In addition, this score was assigned to the edge as its weight. Accordingly, seven PPI networks were built, which can be used to extract informative features of mouse proteins.

The network embedding algorithm, Mashup ^[30], was executed on above constructed seven PPI networks. To our knowledge, it is the only network embedding algorithm that can process multiple networks. This method contains two stages to extract features for each node. In the first stage, each node in each network is assigned a raw feature vector on the basis of random walk with restart algorithm ^[36,37]. In this way, several raw feature vectors are produced for the same node. It is necessary to combine them into one vector. At the same time, the dimensionality reduction is also inevitable because of the high dimension of raw feature vectors, which is equal to the node number in the network. All these are done in the second stage. It supposes a uniform vector for each node and a context vector for any node in any network. Based on them, it produces an approximate vector for any node in any network. The optimal components in above two types of vectors were determined by solving an optimized problem such that the produced approximate vectors based on them should be approximate to raw feature vectors as much as possible. For details, please refer to reference ^[30].

This study adopted the Mashup program downloaded from http://cb.csail.mit.edu/cb/mashup/. Likewise, it was executed with the default parameters. For the dimension of feature vectors, we tried various values between 100 and 300. For convenience, features produced by Mashup were called network embedding features.

Accordingly, each mouse protein can be represented by three forms: (1) domain embedding features; (2) network embedding features; (3) domain and network embedding features.

2.4. Multi-label classifier

As mentioned in Section 2.1, several mouse proteins belonged to two or more functional types. A natural way to assign types to given proteins is to design a multi-label classifier. Generally, there are two schemes to construct multi-label classifiers: problem transformation and algorithm adaption ^[38]. The former one transforms the original multi-label classification problem into some single-label classification problems. The later one generalizes the single-label classification algorithm so that it can process samples with more than one labels. Here, we adopted a widely used problem transformation method, called RAKEL ^[15], to construct the multi-label classifier.

RAKEL is a generalized method of label powerset (LP) algorithm. Given a dataset with h labels, say ${l}_{1}, {l}_{2}, \dots, {l}_{h}$ , randomly construct m label subsets, each of which consists of k labels. For each of these label subsets, new labels are defined as the members in its power set. These new labels are assigned to samples based on their original labels. After such operation, each sample is assigned only one new label. Samples with their new labels constitute a new dataset. A classifier is set up by training some single-label classification algorithm on such new dataset. Accordingly, m classifiers can be set up, which are integrated in RAKEL. For a query sample $x$ , each classifier gives a binary prediction result (0 or 1) for each label ${l}_{i}$ . RAKEL calculates the average vote rate for each label ${l}_{i}$ . When the average vote rate is greater than a given threshold (Generally, it is set to 0.5), ${l}_{i}$ is assigned to $x$ . For an easy description, classifiers built by RAKEL were termed RAKEL classifiers in this study. To quickly implement RAKEL, the tool "RAKEL" in Meka (http://waikato.github.io/meka/) ^[39] was directly employed. The main parameters of RAKEL, m and k, were tuned in this study.

As mentioned in Section 2.2, all labels can be divided into t partitions, say ${L}_{1}, {L}_{2}, \dots, {L}_{t}$ . For each partition, a new dataset is constructed by restricting labels of each sample into this partition. For instance, if one sample is assigned three labels, say ${l}_{1}, {l}_{2}, {l}_{3}$ and ${l}_{1}, {l}_{3}$ belongs to one partition, this sample is assigned ${l}_{1}, {l}_{3}$ as its labels in the new dataset. Accordingly, a RAKEL classifier is built on the new constructed dataset. The final classifier integrates these RAKEL classifiers by collecting their results. In detail, for a query sample, each RAKEL classifier yields its prediction (i.e., a label subset). The final prediction is the union of label subsets yielded by all RAKEL classifiers.

2.5. Base classifier

When building the RAKEL classifiers, a single-label classification algorithm is needed. In this study, two powerful classification algorithms were employed: SVM ^[16] and RF ^[17].

SVM is a popular classification algorithm based on statistical learning theory ^{[31,34,40,41,42,43,44,45,46]}. Its principle is to use a kernel function to map samples from the original space to a higher-dimensional feature space so that samples are linearly separable in the new space. So far, several types of SVM have been designed to process different problems. Here, one type of SVM was adopted. The sequential minimal optimization (SMO) algorithm ^[47] was employed to optimize the training procedures of this type of SVM. A polynomial kernel or an RBF kernel was set as its kernel.

RF is another powerful classification algorithm, which has been widely applied to tackle various biological problems ^{[48,49,50,51,52,53,54]}. In fact, it is an ensemble algorithm, integrating several decision trees. To set up each decision tree, it randomly selects samples from the given dataset, with replacement, and features to extend the tree at each node. For a query sample, all decision trees provide their predictions. These predictions are integrated in RF by majority voting. It is widely accepted that decision tree is a relative weak classifier. However, RF is much more powerful ^[55].

The above SVM and RF algorithms are all implemented by corresponding tools in Meka ^[39]. These tools were directly employed in this study.

2.6. Performance assessment

All multi-label classifiers constructed in this study were assessed by ten-fold cross-validation ^[56]. Such method first divides the original dataset, denoted by $D$ , into 10 mutually exclusive subsets with similar size, i.e., $D = {D}_{1}\cup {D}_{2}\cup \dots {D}_{10}, {D}_{i}\cap {D}_{j} = \varnothing (i\ne j, 1\le i, j\le 10)$ . Each subset, say ${D}_{i}$ , is picked up as test dataset and remaining nine subsets constitute the training dataset. The classifier built on the training dataset is applied to the test dataset. Thus, each sample is exactly tested once.

For the results of ten-fold cross-validation, we can compute some measurements to assess the quality of results. In this study, we employed three widely used measurements in multi-label classification: accuracy, exact matching and hamming loss. To list their formulas, some notations are necessary. Given a dataset with $n$ samples and $m$ labels, suppose that ${L}_{i}$ and ${L}_{i}^{'}$ are the sets of true labels and predicted labels, respectively, of the i^th sample. Above three measurements can be computed by

$\left\{\begin{array}{c}\text{A}\text{c}\text{c}\text{u}\text{r}\text{a}\text{c}\text{y} = \frac{1}{n}\sum\limits _{i = 1}^{n}\left(\frac{‖{L}_{i}\cap {L}_{i}^{'}‖}{‖{L}_{i}\cup {L}_{i}^{'}‖}\right)\\ \text{E}\text{x}\text{a}\text{c}\text{t}\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{m}\text{a}\text{t}\text{c}\text{h} = \frac{1}{n}\sum\limits _{i = 1}^{n}\nabla \left({L}_{i}, {L}_{i}^{'}\right)\\ \text{H}\text{a}\text{m}\text{m}\text{i}\text{n}\text{g}\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{l}\text{o}\text{s}\text{s} = \frac{1}{n}\sum\limits _{i = 1}^{n}\left(\frac{‖{L}_{i}\cup {L}_{i}^{'}-{L}_{i}\cap {L}_{i}^{'}‖}{m}\right)\end{array}\right.$

(4)

where $\nabla$ is defined as below:

$\nabla \left({L}_{i}, {L}_{i}^{'}\right) = \left\{\begin{array}{c}1&{\mathrm{i}\mathrm{f} \ {\mathrm{L}}_{i} \ \mathrm{i}\mathrm{s} \ \mathrm{i}\mathrm{d}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{i}\mathrm{f}\mathrm{i}\mathrm{c}\mathrm{a}\mathrm{l} \ \mathrm{t}\mathrm{o} \ {\mathrm{L}}_{i}^{'}}\\ 0&{otherwise}\end{array} \right.$

(5)

Evidently, the high accuracy and exact match indicate the good performance of the classifier, whereas it is on the contrary for hamming loss.

When comparing the performance of different classifiers, different results may be concluded according to different measurements. The ranges of accuracy, exact match and hamming loss are all between 0 and 1. Accuracy and exact match have the same trend to represent the performance of classifiers, that is, higher value represents higher performance; whereas hamming loss suggest the contrary trend, that is, lower value suggests higher performance. Thus, we refined hamming loss as 1-hamming loss to make it having the same trend as accuracy and exact match. In this case, accuracy, exact match and 1-hamming loss can multiply together to define a new measurement, called integrated score in this study, formulated by

$Integrated \ score = Accuracy*Exact \ match*\left(1-hamming \ loss\right)$

(6)

The higher the integrated score, the higher the performance of the classifier. This measurement has also used in some previous studies ^[45,57].

3. Results and Discussion

In this study, we proposed a multi-label classifier to identify mouse protein functions, incorporating the procedure of analyzing the associations of functional types. Two types of features (domain and network embedding features) were adopted to encode proteins. RAKEL was employed to construct classifiers. The entire procedures are illustrated in Figure 2. In this section, the detailed evaluation results would be given and some comparisons were conducted.

Figure 2. Entire procedures for constructing and evaluating the multi-label classifier. Mouse proteins and their functional annotations (types) are retrieved from MfunGD. The types are analyzed by Louvain method, generating some label partitions. Proteins are represented by two feature types, where one is derived from functional domains via Word2vec and the other is derived from protein-protein interaction networks via Mashup. For each label partition, a classifier is built by RAKEL with support vector machine (SVM) or random forest (RF) as base classifier based on each type of features or both of them. The final multi-label classifier integrates above classifiers and it is assessed by ten-fold cross-validation.

DownLoad: Full-Size Img PowerPoint

3.1. Performance of classifiers with domain embedding features

For the protein features derived from its functional domain information, we adopted RAKEL with a certain base classifier to construct multi-label classifiers. Three base classifiers were tried in this study: (1) SVM with polynomial kernel, (2) SVM with RBF kernel, (3) RF. For two types of SVM, the regularization parameter C was set to 0.5, 1 and 2, the exponent of polynomial kernel was set to its default value (one) and the parameter γ of RBF kernel was also set to its default value (0.01). As for RF, its main parameter, number of decision trees, was tuned, including various values between 10 and 300. The main parameter m for RAKEL was set to its default value 10, the other parameter k of RAKEL was wet to 2, 3, 4, 5. The grid search was adopted to set up all RAKEL classifiers, which were assessed by ten-fold cross-validation, and extract the optimum parameters for each base classifier. The best performance, measured by integrated score, for each base classifier is provided in Table 1, in which the best parameters for each base classifier are also provided. The integrated scores for three base classifiers were 0.1026, 0.0611 and 0.1574. Evidently, the RAKEL classifier with RF provided the best performance. Its accuracy, exact match and hamming loss were 0.6025, 0.2806 and 0.0687, respectively, which were all best compared with those of RAKEL classifiers with other two base classifiers. At a glance, these three RAKEL classifiers were not good enough. However, they were better than classifiers without label partition, which would be elaborated in Section 3.4.

Table 1. Performance of RAKEL classifiers with different base classifiers on domain embedding features.

Base classifier	Parameter	Accuracy	Exact match	Hamming loss	Integrated score
Support vector machine (Polynomial kernel)	m = 10, k = 5, C = 2, exponent = 1	0.5329	0.2087	0.0777	0.1026
Support vector machine (RBF kernel)	m = 10, k = 5, C = 2, γ = 0.01	0.4643	0.1441	0.0872	0.0611
Random forest	m = 10, k = 3, number of decision trees = 250	0.6025	0.2806	0.0687	0.1574

| Show Table

DownLoad: CSV

In addition, to fully evaluate the best RAKEL classifier with a certain base classifier, it was further assessed by ten-fold cross-validation for ten times. The performance under ten-fold cross-validation for ten times is shown in Figure 3, from which we can see that all four measurements yielded by each RAKEL classifier varied in a small range, indicating the classifiers with label partition were quite stable no matter how samples were divided.

Figure 3. Box plot to show the performance of three RAKEL classifiers using domain embedding features. (A) Accuracy; (B) Exact match; (C) Hamming loss; (D) Integrated score.

DownLoad: Full-Size Img PowerPoint

3.2. Performance of classifiers with network embedding features

For the network embedding features derived from seven protein networks, a similar procedure was conducted. The same parameters were tried for three base classifiers and RAKEL. Furthermore, the dimension of features was also tuned, including 100, 150, 200, 250 and 300. The grid search was also used to build all RAKEL classifiers, which were further assessed by ten-fold cross-validation. The best RAKEL classifier with a certain base classifier was found and its performance is listed in Table 2. The optimum parameters for each base classifier are also provided in this table. The integrated scores for three base classifiers were 0.1308, 0.0714 and 0.1269, respectively. Clearly, the RAKEL classifier with SVM (polynomial kernel) generated the best performance, where the accuracy, exact match and hamming loss were 0.5853, 0.2407 and 0.0713. These measurements were best among those yielded by three RAKEL classifiers listed in Table 2. Compared with the performance of RAKEL classifiers based on domain embedding features, the superiority of RAKEL classifiers with network embedding features depended on the base classifier. The SVM base classifier gave better performance, whereas RF base classifier yielded lower performance.

Table 2. Performance of RAKEL classifiers with different base classifiers on network embedding features.

Base classifier	Parameter	Accuracy	Exact match	Hamming loss	Integrated score
Support vector machine (Polynomial kernel)	m = 10, k = 5, C = 2, exponent = 1, feature dimension = 300	0.5853	0.2407	0.0713	0.1308
Support vector machine (RBF kernel)	m = 10, k = 3, C = 2, γ = 0.01, feature dimension=300	0.5020	0.1551	0.0824	0.0714
Random forest	m = 10, k = 5, number of decision trees = 250, feature dimension = 150	0.5727	0.2385	0.0714	0.1269

| Show Table

DownLoad: CSV

Likewise, for the best RAKEL classifiers with different base classifiers, they were further evaluated by additional ten-fold cross-validation for ten times. A box plot was shown in Figure 4 for each measurement. It is easy to see that each measurement of each classifier was changed in a small range, suggesting the stability of three RAKEL classifiers. This result was almost same as those based on the domain embedding features.

Figure 4. Box plot to show the performance of three RAKEL classifiers using network embedding features. (A) Accuracy; (B) Exact match; (C) Hamming loss; (D) Integrated score.

DownLoad: Full-Size Img PowerPoint

3.3. Performance of classifiers with domain and network embedding features

Two types of features were adopted in this study to represent mouse proteins. They indicated essential properties of proteins from different aspects. The combination of these two types of features can be helpful to construct more efficient classifiers. Thus, we constructed RAKEL classifiers using both domain and network embedding features. To save time, we only tried the parameters listed in Tables 1 and 2. The best performance of RAKEL classifiers with different base classifiers are provided in Table 3. The integrated scores for three base classifiers were 0.1619, 0.1096 and 0.1731, respectively. Each of them was higher than the RAKEL classifiers with the same base classifier and domain or network embedding features. Furthermore, it can be observed from Tables 1-3 that given the same base classifier, the classifier with domain and network embedding features always generated higher accuracy, exact match and lower hamming loss than that with only domain or network embedding features. Therefore, the domain and network embedding features can complement each other so that their combination can improve the performance of classifiers.

Table 3. Performance of RAKEL classifiers with different base classifiers on domain and network embedding features.

Base classifier	Parameter	Accuracy	Exact match	Hamming loss	Integrated score
Support vector machine (Polynomial kernel)	m = 10, k = 5, C = 2, exponent = 1, network embedding feature dimension = 300	0.6242	0.2777	0.0660	0.1619
Support vector machine (RBF kernel)	m = 10, k = 5, C = 2, γ = 0.01, network embedding feature dimension = 300	0.5439	0.2177	0.0743	0.1096
Random forest	m = 10, k = 5, number of decision trees = 250, network embedding feature dimension = 150	0.6235	0.2963	0.0633	0.1731

| Show Table

DownLoad: CSV

3.4. Comparison of classifiers without label partition

In this study, the label partition was employed to construct multi-label classifiers for identifying functions of mouse proteins. To elaborate the merits of label partition, we also built RAKEL classifiers that did not adopt the label partition. All parameters for three base classifiers and RAKEL were tried for each feature type. All such classifiers were also assessed by ten-fold cross-validation.

For the classifiers with each base classifier and domain embedding features, we plotted a violin to show their performance on each measurement under different parameters, as shown in Figure 5. For an easy comparison, those yielded by classifiers that employed the label partition were also provided in this figure. It can be observed that the accuracy, exact match and integrated score yielded by classifiers with label partition were all higher than those obtained by classifiers without label partition. As for the hamming loss, it was on the contrary. All these indicated that the employment of label partition can improve the performance of classifiers. For the other feature type, network embedding features, same tests were conducted. The violins of four measurements are illustrated in Figure 6. The same conclusion can be concluded, that is, the classifiers with label partition were generally superior to those without label partition.

Figure 5. Violin plot to compare RAKEL classifiers using domain embedding features with or without label partition. Red violins are for RAKEL classifiers with label partition and green violins are for RAKEL classifiers without label partition. (A) Accuracy; (B) Exact match; (C) Hamming loss; (D) Integrated score.

DownLoad: Full-Size Img PowerPoint

Figure 6. Violin plot to compare RAKEL classifiers using network embedding features with or without label partition. Red violins are for RAKEL classifiers with label partition and green violins are for RAKEL classifiers without label partition. (A) Accuracy; (B) Exact match; (C) Hamming loss; (D) Integrated score.

DownLoad: Full-Size Img PowerPoint

For the classifiers using both domain and network embedding features, we tested them with parameters listed in Table 3 when the label space partition procedure was not used. The results of ten-fold cross-validation are listed in Table 4. Evidently, classifiers without label partition were much inferior to those with label partition, suggesting the effectiveness of the label partition.

Table 4. Performance of RAKEL classifiers using domain and network embedding features but without label partition.

Base classifier	Parameter	Accuracy	Exact match	Hamming loss	Integrated score
Support vector machine (Polynomial kernel)	m = 10, k = 5, C = 2, exponent = 1, network embedding feature dimension = 300	0.5059	0.1507	0.0781	0.0703
Support vector machine (RBF kernel)	m = 10, k = 5, C = 2, γ = 0.01, network embedding feature dimension=300	0.4485	0.1112	0.0848	0.0456
Random forest	m = 10, k = 5, number of decision trees = 250, network embedding feature dimension = 150	0.5069	0.1608	0.0762	0.0753

| Show Table

DownLoad: CSV

3.5. Comparison of classifiers with random label partition

The classifiers proposed in this study adopted the label partition yielded by Louvain method. To confirm such obtained partition was really helpful to improve the performance of classifiers, we employed the random label partition, which randomly divided class labels into some partitions. To give a far comparison, the distribution of partition sizes in random partition was same as that in the partition yielded by Louvain method. On each random partition, the best RAKEL classifier with each base classifier and each feature type was built and assessed by ten-fold cross-validation. Such procedures executed ten times for different random partitions. The performance (integrated score) of each RAKEL classifier on two feature types is shown in Figures 7 and 8, respectively. For easy comparisons, the performance of RAKEL classifiers with partition yielded by Louvain method under ten-fold cross-validation for ten times was also listed in these two figures. It can be observed that when the base classifier was SVM (polynomial kernel) or RF, the RAKEL classifiers with partition yielded by Louvain method always generated better performance. As for the base classifier, SVM (RBF kernel), its superiority was not very obvious. It provided relatively better performance using domain embedding features. However, for network embedding features, classifiers with partition yielded by Louvain method were not always better than those with random partition. As a whole, classifiers with partition yielded by Louvain method were superior to those with random partition. The reasonable partition of class labels can further improve the performance of classifiers.

Figure 7. Violin plot to compare RAKEL classifiers using domain embedding features with partition yielded by Louvain method and random partition. Red violins indicate integrated scores yielded by classifiers with partition yielded by Louvain method, black violins represent integrated scores yielded by classifiers with random partition. (A) SVM (polynomial kernel) is the base classifier; (B) SVM (RBF kernel) is the base classifier; (C) RF is the base classifier.

DownLoad: Full-Size Img PowerPoint

Figure 8. Violin plot to compare RAKEL classifiers using network embedding features with partition yielded by Louvain method and random partition. Red violins indicate integrated scores yielded by classifiers with partition yielded by Louvain method, black violins represent integrated scores yielded by classifiers with random partition. (A) SVM (polynomial kernel) is the base classifier; (B) SVM (RBF kernel) is the base classifier; (C) RF is the base classifier.

DownLoad: Full-Size Img PowerPoint

For the classifiers with both domain and network embedding features, we also compared them with those using random partition. The performance of classifier with each base classifier and random partition is listed in Table 5. Compared with results listed in Table 3, classifiers with partition yielded by Louvain method always produced higher accuracy, exact match and integrated score. As for hamming loss, classifiers with random partition yielded lower values when SVM was the base classifier. However, this cannot change the fact that classifiers with partition yielded by Louvain method were superior to the classifiers with random partition.

Table 5. Performance of RAKEL classifiers using domain and network embedding features but with random label partition.

Base classifier	Parameter	Accuracy	Exact match	Hamming loss	Integrated score
Support vector machine (Polynomial kernel)	m = 10, k = 5, C = 2, exponent = 1, network embedding feature dimension = 300	0.6177	0.2705	0.0654	0.1562
Support vector machine (RBF kernel)	m = 10, k = 5, C = 2, γ = 0.01, network embedding feature dimension=300	0.5427	0.2138	0.0737	0.1075
Random forest	m = 10, k = 5, number of decision trees = 250, network embedding feature dimension = 150	0.6195	0.2952	0.0635	0.1713

| Show Table

DownLoad: CSV

3.6. Comparison of the previous classifier

In references ^[9,10], two hybrid classifiers were proposed to identify functions of mouse proteins. They contained one network-based classifier, which was constructed based on PPI information reported in STRING. For a query protein, this classifier assigned a score to each of 24 functional types. Then, 24 types were sorted by the decreasing order of corresponding scores. Evidently, this classifier cannot determine which types were the predicted types. To compare with our classifiers, we employed a threshold for such score so that this classifier can determine the predicted types. Various thresholds were tried for this classifier, which was assessed by ten-fold cross-validation for ten times. The highest integrated score was only 0.0160, which was much lower than those listed in Tables 1-3. The accuracy was 0.2532, exact match was 0.0706 and hamming loss was 0.1059. Clearly, such performance was much lower than that of any above-mentioned classifier. This result indicated that the classifiers proposed in this study were superior to this previous classifier.

3.7. Functional type analysis

As mentioned above, the usage of label partition improved the performance of multi-label classifiers. The final classifier should use the label partition on the whole dataset. This section gave analyses on 24 functional types (labels).

First, we constructed a protein subset for each label, which consisted of all proteins having this label. For any two labels, their associations were evaluated by the Tanimoto coefficient of their corresponding protein subsets. A heat map was plotted to show Tanimoto coefficients for any two functional types, as illustrated in Figure 9. It can be observed that class 14 (TRANSPOSABLE ELEMENTS, VIRAL AND PLASMID PROTEINS) has weak associations with almost all other classes. On the contrary, class 7 (PROTEIN WITH BINDING FUNCTION OR COFACTOR REQUIREMENT (structural or catalytic)) and class 21 (SUBCELLULAR LOCALIZATION) were highly related to other classes. By using the Louvain method, 24 functional types were divided into three partitions, which are listed in Table 6. There were 14 functional types in Partition 1, whereas other two partitions all contained five functional types. Not surprisingly, class 7 and class 21 were classified into the same partition. Given a protein representation, a multi-label classifier can be built on each partition. Classifiers on all three partitions were integrated in the final multi-label classifier.

Figure 9. Heat map to show the associations of functional types. The corresponding functional types of class index 1-24 can be found in Figure 1.

DownLoad: Full-Size Img PowerPoint

Table 6. Three communities obtained by using Louvain method.

Index	Functional type
Partition 1	PROTEIN WITH BINDING FUNCTION OR COFACTOR REQUIREMENT (structural or catalytic) REGULATION OF METABOLISM AND PROTEIN FUNCTION CELLULAR COMMUNICATION/SIGNAL TRANSDUCTION MECHANISM SUBCELLULAR LOCALIZATION CELLULAR TRANSPORT, TRANSPORT FACILITIES AND TRANSPORT ROUTES TRANSCRIPTION ENERGY METABOLISM CELL CYCLE AND DNA PROCESSING PROTEIN FATE (folding, modification, destination) BIOGENESIS OF CELLULAR COMPONENTS SYSTEMIC INTERACTION WITH THE ENVIRONMENT PROTEIN SYNTHESIS CELL RESCUE, DEFENSE AND VIRULENCE
Partition 2	INTERACTION WITH THE ENVIRONMENT CELL TYPE LOCALIZATION TISSUE LOCALIZATION ORGAN LOCALIZATION TRANSPOSABLE ELEMENTS, VIRAL AND PLASMID PROTEINS
Partition 3	CELL FATE DEVELOPMENT (Systemic) TISSUE DIFFERENTIATION ORGAN DIFFERENTIATION CELL TYPE DIFFERENTIATION

| Show Table

DownLoad: CSV

3.8. Further study

By employing the association information of functional types, the performance of the multi-label classifiers for identification of mouse protein functions was improved. However, there still exist rooms for improvement. First, protein features are key factors that can influence the performance of classifiers. Some novel and efficient protein features, such as motif embedding features ^[58], can be adopted to further improve the classifiers. Second, only one community detection algorithm, Louvain method, was employed to cluster functional types in this study. It was not clear whether this algorithm was optimum to deal with this problem. Some novel community detection algorithms may deeply investigate the associations between functional types, thereby producing a more optimum label partition. Finally, we adopted traditional machine learning algorithms (RAKEL, SVM, RF) to construct classifiers. They can be replaced with more powerful algorithms, such as deep learning algorithms, so that more efficient classifiers can be built. In future, we will continue our study in these aspects.

4. Conclusions

This study proposed a novel multi-label classifier for identification of functions of mouse proteins. Such classifier considered the associations of functional types (labels) and divided labels into some partitions. By employing the label partition, the performance of classifiers was improved. This classifier can be easily extended to other organisms. It is hopeful that this classifier can be helpful to identify novel functions of mouse proteins. All codes and data are available at https://github.com/LiXuuuu/Mouse-Protein.

Conflict of interest

The authors declare no conflict of interest.

References

[1]	Adams S, Acheampong A (2019) Reducing Carbon Emissions: The Role of Renewable Energy and Democracy. J Clean Prod 240: 118245.
[2]	Alhassan S, Alade SA (2017) Income and Democracy in Sub-Sahara Africa. J Econ Sust Dev 8: 67-73.
[3]	Alola AA, Bekun VF, Sarkodie AS (2019) Dynamic impact of trade policy, economic growth, fertility rate, renewable and non-renewable energy consumption on ecological footprint in Europe. Sci Total Environ 685: 702-709. doi: 10.1016/j.scitotenv.2019.05.139
[4]	Barrett S, Graddy K (2000) Freedom, growth, and the environment. Environ Dev Econ 5: 433-456. doi: 10.1017/S1355770X00000267
[5]	Becker S, Kunze C (2014) Transcending community energy: collective and politically motivated projects in renewable energy (CPE) across Europe. People Place Policy 8: 180-191.
[6]	Bollerslev T (1986) Generalized Autoregressive Conditional Heteroscedasticity. J Econometrics 31: 307-327. doi: 10.1016/0304-4076(86)90063-1
[7]	Bondia R, Ghosh S, Kanjilal K (2016) International Crude Oil Prices and the Stock Prices of Clean Energy and Technology Companies: Evidence from Non-linear Cointegration Tests with Unknown Structural Breaks. Energy 101: 558-565. doi: 10.1016/j.energy.2016.02.031
[8]	Burke M, Stephens CJ (2018) Political power and renewable energy futures: A critical review. Energy Res Social Sci 35: 78-93. doi: 10.1016/j.erss.2017.10.018
[9]	Corbet S, Goodell WJ, Günay S (2020) Co-movements and spillovers of oil and renewable firms under extreme conditions: new evidence from negative WTI prices during COVID-19. Energy Econ 92: 104978.
[10]	Diebold FX, Yilmaz K (2012) Better to Give than to Receive: Predictive Directional Measurement of Volatility Spillovers. Int J Forecasting 28: 57-66. doi: 10.1016/j.ijforecast.2011.02.006
[11]	Dutta A (2017) Oil Price Uncertainty and Clean Energy Stock Returns: New Evidence from Crude Oil Volatility Index. J Clean Prod 164: 1157-1166. doi: 10.1016/j.jclepro.2017.07.050
[12]	Engle R (1982) Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica 50: 987-1007. doi: 10.2307/1912773
[13]	Engle R, Ng KV (1993) Measuring and Testing the Impact of News on Volatility. J Financ 48: 1749-1778. doi: 10.1111/j.1540-6261.1993.tb05127.x
[14]	Farzin YH, Bond CA (2006) Democracy and Environmental Quality. J Dev Econ 81: 213-235. doi: 10.1016/j.jdeveco.2005.04.003
[15]	Ferrer R, Shahzad SJH, López R, et al. (2018) Time and Frequency Dynamics of Connectedness Between Renewable Energy Stocks and Crude Oil Prices. Energy Econ 76: 1-20. doi: 10.1016/j.eneco.2018.09.022
[16]	Inchauspe J, Ripple RD, Trück S (2015) The Dynamics of Returns on Renewable Energy Companies: A State-Space Approach. Energy Econ 48: 325-335. doi: 10.1016/j.eneco.2014.11.013
[17]	Kumar S, Managi S, Matsuda A (2012) Stock Prices of Clean Energy Firms, Oil and Carbon Markets: A Vector Autoregressive Analysis. Energy Econ 34: 215-226. doi: 10.1016/j.eneco.2011.03.002
[18]	Lv Z (2017) The effect of democracy on CO₂ emissions in emerging countries: Does the level of income matter? Renew Sust Energy Rev 72: 900-906.
[19]	Maji IK (2015) Does Clean Energy Contribute to Economic Growth? Evidence from Nigeria. Energy Rep 1: 145-150. doi: 10.1016/j.egyr.2015.06.001
[20]	Nguyen C, Schinckus C, Thanh SD (2019) Economic Integration and CO₂ Emissions: Evidence from Emerging Economies. Climate Dev 12: 369-384. doi: 10.1080/17565529.2019.1630350
[21]	Nguyen C, Schinckus C, Thanh SD (2018) The ambivalent role of institutions in the CO₂ Emissions: The case of Emerging Countries. Int J Energy Econ Policy 8: 7-17.
[22]	Nguyen C, Schinckus C, Dinh TS, Bensemann J, et al. (2019) Global Emissions: A New Contribution from the Shadow Economy. Int J Energy Econ Policy 9: 320-337.
[23]	Nguyen C, Ha-Le T, Schinckus C, et al. (2020) Determinants of agricultural emissions: panel data evidence from a global sample. Environ Dev Econ 26: 109-130. doi: 10.1017/S1355770X20000315
[24]	Omri A, Mabrouk NB, Sassi-Tmar A (2015) Modeling the Causal Linkages Between Nuclear Energy, Renewable Energy and Economic Growth in Developed and Developing Countries. Working Papers 2015-623, Department of Research, Ipag Business School.
[25]	Reboredo JC (2015) Is There Dependence and Systemic Risk Between Oil and Renewable Energy Stock Prices? Energy Econ 48: 32-45.
[26]	Roberts JT, Parks BC (2007) Fueling Injustice: Globalization, Ecologically Unequal Exchange and Climate Change. Globalizations 4: 193-210.
[27]	Sadorsky P (2012) Correlations and Volatility Spillovers Between Oil Prices and the Stock Prices of Clean Energy and Technology Companies. Energy Econ 34: 248-255. doi: 10.1016/j.eneco.2011.03.006
[28]	Symitsi E, Chalvatzis KJ (2019) The economic value of Bitcoin: A portfolio analysis of currencies, gold, oil and stocks. Res Int Bus Financ 48: 97-110. doi: 10.1016/j.ribaf.2018.12.001
[29]	Szulecki K, Overland I (2018) Energy democracy as a process, an outcome a goal: A conceptual review. Energy Res Social Sci 69: 101768.
[30]	Torras M, Boyce KJ (1998) Income, inequality, and pollution: a reassessment of the environmental Kuznets Curve. Ecol Econ 25: 147-160. doi: 10.1016/S0921-8009(97)00177-8
[31]	Ulusoy V, Özdurak C (2018) The Impact of Oil Price Volatility to Oil and Gas Company Stock Returns and Emerging Economies. Int J Energy Econ Policy Econ J 8: 144-158.
[32]	Usman O, Iortile IB, Ike NG (2020) Enhancing sustainable electricity consumption in a large ecological reserve-based country: the role of democracy, ecological footprint, economic growth, and globalisation in Brazil. Environ Sci Pollut Res 27: 13370-13383. doi: 10.1007/s11356-020-07815-3
[33]	Usman O, Olanipekun OI, Iorember TP, et al. (2020) Modelling environmental degradation in South Africa: the effects of energy consumption, democracy, and globalization using innovation accounting tests. Environ Sci Pollut Res 27: 8334-8349. doi: 10.1007/s11356-019-06687-6

This article has been cited by:

1.	FeiMing Huang, Lei Chen, Wei Guo, Tao Huang, Yu-dong Cai, Fan Yang, Identification of Human Cell Cycle Phase Markers Based on Single-Cell RNA-Seq Data by Using Machine Learning Methods, 2022, 2022, 2314-6141, 1, 10.1155/2022/2516653
2.	Yu-Hang Zhang, ShiJian Ding, Lei Chen, Tao Huang, Yu-Dong Cai, Andrey Cherstvy, Subcellular Localization Prediction of Human Proteins Using Multifeature Selection Methods, 2022, 2022, 2314-6141, 1, 10.1155/2022/3288527
3.	Jian Lu, Mei Meng, XianChao Zhou, Shijian Ding, KaiYan Feng, Zhenbing Zeng, Tao Huang, Yu-Dong Cai, Identification of COVID-19 severity biomarkers based on feature selection on single-cell RNA-Seq data of CD8+ T cells, 2022, 13, 1664-8021, 10.3389/fgene.2022.1053772
4.	Yu-Hang Zhang, Zhan Dong Li, Tao Zeng, Lei Chen, Tao Huang, Yu-Dong Cai, Screening gene signatures for clinical response subtypes of lung transplantation, 2022, 297, 1617-4615, 1301, 10.1007/s00438-022-01918-x
5.	Wenjing Yi, Ao Sun, Manman Liu, Xiaoqing Liu, Wei Zhang, Qi Dai, Lin Lu, Comparative Study on Feature Selection in Protein Structure and Function Prediction, 2022, 2022, 1748-6718, 1, 10.1155/2022/1650693
6.	ZhanDong Li, Deling Wang, HuiPing Liao, ShiQi Zhang, Wei Guo, Lei Chen, Lin Lu, Tao Huang, Yu-Dong Cai, Exploring the Genomic Patterns in Human and Mouse Cerebellums Via Single-Cell Sequencing and Machine Learning Method, 2022, 13, 1664-8021, 10.3389/fgene.2022.857851
7.	Zhiyang Liu, Mei Meng, ShiJian Ding, XiaoChao Zhou, KaiYan Feng, Tao Huang, Yu-Dong Cai, Identification of methylation signatures and rules for predicting the severity of SARS-CoV-2 infection with machine learning methods, 2022, 13, 1664-302X, 10.3389/fmicb.2022.1007295
8.	Feiming Huang, Lei Chen, Wei Guo, Xianchao Zhou, Kaiyan Feng, Tao Huang, Yudong Cai, Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method, 2022, 12, 2075-1729, 806, 10.3390/life12060806
9.	Qiao Sun, Lin Bai, Shaopin Zhu, Lu Cheng, Yang Xu, Yu-Dong Cai, Hui Chen, Jian Zhang, Ji-Fu Wei, Analysis of Lymphoma-Related Genes with Gene Ontology and Kyoto Encyclopedia of Genes and Genomes Enrichment, 2022, 2022, 2314-6141, 1, 10.1155/2022/8503511
10.	Jiwei Song, FeiMing Huang, Lei Chen, KaiYan Feng, Fangfang Jian, Tao Huang, Yu-Dong Cai, Identification of methylation signatures associated with CAR T cell in B-cell acute lymphoblastic leukemia and non-hodgkin’s lymphoma, 2022, 12, 2234-943X, 10.3389/fonc.2022.976262
11.	Yaochen Xu, FeiMing Huang, Wei Guo, KaiYan Feng, Lin Zhu, Zhenbing Zeng, Tao Huang, Yu-Dong Cai, Characterization of chromatin accessibility patterns in different mouse cell types using machine learning methods at single-cell resolution, 2023, 14, 1664-8021, 10.3389/fgene.2023.1145647
12.	Xiaoqing Liu, Wenjing Yi, Baohang Xi, Qi Dai, Lin Lu, Identification of Drug-Disease Associations Using a Random Walk with Restart Method and Supervised Learning, 2022, 2022, 1748-6718, 1, 10.1155/2022/7035634
13.	Zhandong Li, Zi Mei, Shijian Ding, Lei Chen, Hao Li, Kaiyan Feng, Tao Huang, Yu-Dong Cai, Identifying Methylation Signatures and Rules for COVID-19 With Machine Learning Methods, 2022, 9, 2296-889X, 10.3389/fmolb.2022.908080
14.	ZhanDong Li, Wei Guo, ShiJian Ding, Lei Chen, KaiYan Feng, Tao Huang, Yu-Dong Cai, Identifying Key MicroRNA Signatures for Neurodegenerative Diseases With Machine Learning Methods, 2022, 13, 1664-8021, 10.3389/fgene.2022.880997
15.	Hao Li, Feiming Huang, Huiping Liao, Zhandong Li, Kaiyan Feng, Tao Huang, Yu-Dong Cai, Identification of COVID-19-Specific Immune Markers Using a Machine Learning Method, 2022, 9, 2296-889X, 10.3389/fmolb.2022.952626
16.	Jian Lu, JiaRui Li, Jingxin Ren, Shijian Ding, Zhenbing Zeng, Tao Huang, Yu-Dong Cai, Functional and embedding feature analysis for pan-cancer classification, 2022, 12, 2234-943X, 10.3389/fonc.2022.979336
17.	ZhanDong Li, FeiMing Huang, Lei Chen, Tao Huang, Yu-Dong Cai, Identifying In Vitro Cultured Human Hepatocytes Markers with Machine Learning Methods Based on Single-Cell RNA-Seq Data, 2022, 10, 2296-4185, 10.3389/fbioe.2022.916309
18.	Shiheng Lu, Hui Wang, Jian Zhang, Identification of uveitis-associated functions based on the feature selection analysis of gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment scores, 2022, 15, 1662-5099, 10.3389/fnmol.2022.1007352
19.	Xianchao Zhou, Shijian Ding, Deling Wang, Lei Chen, Kaiyan Feng, Tao Huang, Zhandong Li, Yudong Cai, Identification of Cell Markers and Their Expression Patterns in Skin Based on Single-Cell RNA-Sequencing Profiles, 2022, 12, 2075-1729, 550, 10.3390/life12040550
20.	ZhanDong Li, Wei Guo, Tao Zeng, Jie Yin, KaiYan Feng, Tao Huang, Yu-Dong Cai, Detecting Brain Structure-Specific Methylation Signatures and Rules for Alzheimer’s Disease, 2022, 16, 1662-453X, 10.3389/fnins.2022.895181
21.	Zhandong Li, Xiaoyong Pan, Yu-Dong Cai, Identification of Type 2 Diabetes Biomarkers From Mixed Single-Cell Sequencing Data With Feature Selection Methods, 2022, 10, 2296-4185, 10.3389/fbioe.2022.890901
22.	Xiaohong Li, Xianchao Zhou, Shijian Ding, Lei Chen, Kaiyan Feng, Hao Li, Tao Huang, Yu-Dong Cai, Identification of Transcriptome Biomarkers for Severe COVID-19 with Machine Learning Methods, 2022, 12, 2218-273X, 1735, 10.3390/biom12121735
23.	Man Li, Xinyi Zhou, Siyao Qin, Ziyan Bin, Yanhui Wang, Improved RAkEL’s Fault Diagnosis Method for High-Speed Train Traction Transformer, 2023, 23, 1424-8220, 8067, 10.3390/s23198067
24.	Hao Wang, Lei Chen, PMPTCE-HNEA: Predicting Metabolic Pathway Types of Chemicals and Enzymes with a Heterogeneous Network Embedding Algorithm, 2023, 18, 15748936, 748, 10.2174/1574893618666230224121633
25.	Lei Chen, Linyang Li, Prediction of Drug Pathway-based Disease Classes using Multiple Properties of Drugs, 2024, 19, 15748936, 859, 10.2174/0115748936284973240105115444
26.	Jing-Xin Ren, Qian Gao, Xiao-Chao Zhou, Lei Chen, Wei Guo, Kai-Yan Feng, Lin Lu, Tao Huang, Yu-Dong Cai, Identification of Gene Markers Associated with COVID-19 Severity and Recovery in Different Immune Cell Subtypes, 2023, 12, 2079-7737, 947, 10.3390/biology12070947
27.	Lei Chen, Ruyun Qu, Xintong Liu, Improved multi-label classifiers for predicting protein subcellular localization, 2023, 21, 1551-0018, 214, 10.3934/mbe.2024010
28.	Kailun Sun, Cornelis A. M. van Gestel, Hao Qiu, Two-Dimensional Layered Nano-MoS2 Induces Earthworm Immune Cell Apoptosis by Regulating Lysosomal Maintenance and Function: Toward Unbiased Screening and Validation of Suspicious Pathways, 2024, 58, 0013-936X, 19948, 10.1021/acs.est.4c04512
29.	Lei Chen, Huiping Liao, Guohua Huang, Shijian Ding, Wei Guo, Tao Huang, Yudong Cai, Identification of DNA Methylation Signature and Rules for SARS-CoV-2 Associated with Age, 2022, 27, 2768-6701, 10.31083/j.fbl2707204
30.	Yuanyuan Luo, Yihan Wang, Lin Liu, Feiming Huang, Shiheng Lu, Yan Yan, Identifying pathological myopia associated genes with GenePlexus in protein-protein interaction network, 2025, 16, 1664-8021, 10.3389/fgene.2025.1533567

Reader Comments

Your name:*

Email:*
© 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Green Finance

5.5 9.6

Metrics

Article views(3067) PDF downloads(170) Cited by(21)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(3) / Tables(3)

Green Finance

Nexus between crude oil prices, clean energy investments, technology companies and energy democracy

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Datasets

2.2. Label space partition

2.3. Feature engineering

2.3.1. Domain embedding features

2.3.2. Network embedding features

2.4. Multi-label classifier

2.5. Base classifier

2.6. Performance assessment

3. Results and Discussion

3.1. Performance of classifiers with domain embedding features

3.2. Performance of classifiers with network embedding features

3.3. Performance of classifiers with domain and network embedding features

3.4. Comparison of classifiers without label partition

3.5. Comparison of classifiers with random label partition

3.6. Comparison of the previous classifier

3.7. Functional type analysis

3.8. Further study

4. Conclusions

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Green Finance

Nexus between crude oil prices, clean energy investments, technology companies and energy democracy

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Datasets

2.2. Label space partition

2.3. Feature engineering

2.3.1. Domain embedding features

2.3.2. Network embedding features

2.4. Multi-label classifier

2.5. Base classifier

2.6. Performance assessment

3. Results and Discussion

3.1. Performance of classifiers with domain embedding features

3.2. Performance of classifiers with network embedding features

3.3. Performance of classifiers with domain and network embedding features

3.4. Comparison of classifiers without label partition

3.5. Comparison of classifiers with random label partition

3.6. Comparison of the previous classifier

3.7. Functional type analysis

3.8. Further study

4. Conclusions

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog