1.
Introduction
Electronic health records (EHRs) are generated by hospitals to maintain real-time digital records of patients, including clinical events, diagnosis, and treatment information of patients during the outpatient and inpatient periods. EHRs are composed of different structured [1] and unstructured data, including demographic information, diagnosis information, laboratory results, medication prescriptions, medical imaging, and text descriptions, for example, as free clinical notes. Clinical assistant diagnosis is a fundamental task in smart healthcare, aiming to automatically extract valuable information from EHRs and make correct and intelligent diagnostic decisions, which improves the diagnostic efficiency of clinicians and reduces misdiagnosis. It is important in personalized medicine, disease prediction, and other tasks.
Although EHRs have been widely studied and have greatly improved clinical decision support [2], there are still some challenges for clinical assistant diagnosis based on EHRs. On the one hand, EHRs contain multiple types of data such as population information and laboratory tests, which make EHRs highly dimensional and heterogeneous, and it is challenging to create large-scale computational models from EHRs. On the other hand, an EHR is incomplete and noisy, because some hospitals are still further standardizing the unified writing standard of EHRs and patient follow-up records [3].
Recently, GNNs have been used extensively for clinical assistant diagnosis with significant research findings [1,2,3,4]. However, there are still three obstacles. First, most of the GNNs based on homogenous graph models focus on the co-occurrence of various medical concepts. Furthermore, although there are a few heterogeneous graph models based on EHRs, each relation in these models can represent only the local information of the heterogeneous graph. Combining different relationships forms a large amount of higher-order semantic information. Treating these heterogeneous relations uniformly is impossible, which limits the learning ability of GNNs. Finally, GNNs are highly sensitive to the original topology of information transmission in smart healthcare applications. An artificially designed heterogeneous graph model is usually extracted from structured and unstructured medical text data, which inevitably contain redundant, erroneous, and missing information. Consequently, it is difficult to deal with the challenges of structure learning and optimization in a heterogeneous graph model based on EHRs.
We propose a new model, PM-GSL, which introduces graph structure learning into diabetes clinical assistant diagnosis, and abstractly describes the semantic relations between entities in EHRs through patient multi-relational graph. PM-GSL can not only learn the structure of heterograph, but also learn the parameters of GNNs to overcome the weak structure problem of patient multi-relational graph. The contributions of this study are summarized as follows:
1) We apply graph structure learning to diabetes clinical assistant diagnosis and use EHRs to construct a patient multi-relational graph, which abstractly describes the relation between entities in EHRs from two perspectives: node attributes and meta-paths.
2) We propose a novel diabetes clinical assistant diagnosis model, PM-GSL. In this model, we design three candidate graphs, original subgraph, overall feature graph, and higher-order semantic graph, and fuse the three candidate graphs to generate a higher quality heterogeneous graph to obtain the complex interactive higher-order semantic information of the patient multi-relational graph.
3) We also conduct multiple experiments to compare the results with the existing state-of-the-art approach on two standard EHRs datasets: the large international public dataset MIMIC-Ⅳ and the Chinese hospital dataset P-EHRs. Our proposed PM-GSL model achieved a 9.62% improvement in Macro-F1, 9.17% improvement in Micro-F1 and 10.33% in AUC compared to the state-of-the-art models for diabetes clinical assistant diagnosis tasks.
The remainder of this paper is organized as follows. Section 2 describes related work, where we discuss deep learning models, such as GNNs, disease assistant diagnosis, and graph structure learning, which motivate this work. Section 3 describes the problem setting of diabetes clinical assistant diagnosis and presents the related definitions. Section 4 details the PM-GSL model. Section 5 demonstrates the results of our empirical evaluation of the PM-GSL on the MIMIC-Ⅳ and P-EHRs datasets. Finally, Section 6 provides concluding remarks and research directions.
2.
Related work
2.1. Graph neural networks
GNNs were first proposed by Scarselli et al. [5] to further extend deep learning to graph-structured data, which mainly contains two types: spectral domain [6,7] and spatial domain [8,9]. The method based on the spectral domain adopts the spectral representation of the graph [10]. The spatial domain-based methods define convolution directly on the graph and aggregate feature information for each node from the spatial neighborhood, such as GraphSAGE [11] and GAT [12]. However, the above graph neural network can only be used to process a homogeneous graph. Some recent studies have attempted to extend GNNs to heterogeneous graphs, such as metapath2vec [13], which transforms heterogeneous graphs into homogeneous graph studies using random wandering based on meta-paths to learn graph representations.
The HAN [14] model utilizes the graph attention network method for the modeling and analysis of a heterogeneous graph, aggregating attribute information from meta-path-based neighbor nodes. Based on the HAN, the MAGNN [15] model combines the internal aggregation of meta-paths and aggregation between meta-paths. GNNs have powerful capabilities in learning node embedding representations and have achieved significant performance in numerous specific tasks [16]. However, almost all GNNs regard observed data containing noise or hypothetical data that are convenient for modeling as real information, which significantly depends on the quality of the original graph structure, greatly limiting their ability to handle the uncertainty in the graph structure.
2.2. Graph structure learning
Graph structure learning (GSL) can be traced back to several research achievements devoted to graph structure learning in network science [17,18]. The GSL handles the noise and incompleteness problems in the original graph data. Some existing methods learn graph structure from measurements of graph dynamical systems, such as coupled oscillators [19], and there are also studies [10] that set up attribute completion strategies for attribute missing problems. Nevertheless, the goal of these findings is different from graph representation learning. Recent studies have attempted to combine GSL with GNNs to improve the performance of downstream tasks. Franceschi et al. proposed a new method, LDS [20], to jointly learn and optimize the original graph and GNN parameters by approximating the solution.
The Pro-GNN [21] aims to extract the graph properties of sparsity, low rank and feature smoothness to design more robust GNNs that learn clean and efficient graph structures. The GTN [22] model can generate a new graph structure by identifying valuable edges that may exist as unconnected nodes on the original graph and can learn the effective embedded representation of nodes on the new graph from end to end. However, most GSL models are designed for homogeneous and smaller graphs. They modify the graph structure by removing noisy data from the graph topology and graph attribute similarity. Such models tend to optimize the entire graph [23], and it is difficult to analyze large-scale graph data with complex structures and rich semantics in the medical field.
2.3. Disease assistant diagnosis
Graph-based methods have been widely used in auxiliary disease diagnosis tasks because of the powerful expression ability of graphs in relation modeling. In [3], a comprehensive disease self-diagnosis system based on the HealGCN model was proposed, which uses the predefined meta-paths inductive graph convolution operation to generate the embedded representation of new patients and solve the cold start problem by mining the complex interactions in EHRs data. The MVS-GCN [24] model, based on a graph neural network, learns the effective representation of the brain network in an end-to-end manner and combines graph structure learning and multitask graph embedding representation learning, thus improving the classification performance of brain disease diagnosis.
MM-STGNN [25] is a multimodal spatiotemporal graph neural network that uses graphs to represent the topological relationship between inpatients and predicts 30-day readmission according to patients' longitudinal imaging and EHRs data. In [26], the authors proposed an end-to-end multi-modal graph learning framework (MMGL) for disease prediction with multi-modality. And it aggregates the features of each modality by leveraging the correlation and complementarity between the modalities. Although GNN models have shown excellent predictive performance in numerous healthcare tasks, they are highly sensitive to the quality of graph structures. If a manually constructed graph is directly used in the GNNs and separated from the prediction module, it will lead to cumbersome tuning and poor generalization [27]. Simultaneously, these models failed to further explore the complementary information between multiple relations in the graph and the message propagation mechanism in the multi-relational subgraph.
3.
Preliminaries
3.1. Clinical assistant diagnosis problem definition
Diabetes, glaucoma and central nervous system diseases (such as stroke) are common in middle-aged and elderly people. Their early symptoms are extremely similar and are difficult to distinguish clinically [28]. For example, in a diabetes patient with a wide range of blood glucose fluctuations, the osmotic pressure inside and outside the lens changes, causing a change in the curvature of the lens and affecting the patient's ability to focus the eye, thus producing blurred vision symptoms. Although blurred vision symptoms may be due to diabetes, glaucoma can also cause eye diopter changes in patients, resulting in lower intraocular pressure and causing blurred vision. In addition, stroke patients often show symptoms, such as blurred vision, sudden pins and needle sensations, and fatigue, which are similar to those of diabetic patients. This differs from many previous studies that modeled disease diagnosis as a link-prediction task. We use patient demographics, laboratory tests, physiological records, and medications in EHRs as nodes in the multi-relational graph to transform the disease prediction task into a classification task with fine-grained analysis.
3.2. Patient multi-relational graph
The patient multi-relational graph is defined as G=(V,E,X,{ξγ}Rr=1), where V is the node set {v1,v2,...vN}. We let N denote the number of nodes, and the corresponding node type is M. E is the set of edges, all edges form a primitive adjacency matrix X∈RN×N, where Xij indicates whether there is an edge between nodes vi and vj, edge relation type is ξγ, r∈{1,2...,R}, and the multi-relational graph satisfies the inequality |M|+|R|>2. The eigenvector of each node is fi∈Rd, and all the node feature matrices are represented as X={f1,f2...fN}∈RN×d.
3.3. Meta-path
In the patient multi-relational graph, two nodes can be connected by different semantic patterns. The meta-path ϕ is formalized as v1r1→v2r2→...rl−1→vl, which is used to describe the composite relation r1∘r2∘...∘rl−1 between nodes v1 and vl, where ∘ is a relational synthesis operation.
3.4. Patient multi-relational subgraph
The patient multi-relational subgraph Gr contains all node-pair triples with relation r, which is formalized as (vi,r,vj). The adjacency matrix of a multi-relational subgraph is expressed as Ar, and if (vi,r,vj)∈Gr, then Arij=1; otherwise, Arij=0. A={Ar,r∈R} represents the set of all patient multi-relational subgraphs. For example, we define the type mapping functions ϖh(r) and ϖe(r) of the node to the triplet header and tail node in a patient medical record network. If r= "pt" is interpreted as ϖh(r)= "patient, " ϖe(r)= "test." The adjacency matrix of the patient multi-relational subgraph is expressed as Ar∈R|ϖh(r)|×|ϖe(r)|.
4.
Patient multi-relational graph structure learning model
4.1. Constructing the patient multi-relational graph
In this study, the EHRs are modeled as a patient multi-relational graph, where each patient is considered a node, and their EHRs data are associated with multiple medical concepts. For example, the following information is extracted from a record in the EHRs: if a patient's fasting blood glucose exceeded 7.0 mmol/L or postprandial blood glucose exceeded 11.1 mmol/L in the glucose tolerance test, the doctor identified abnormal blood glucose in the diagnosis conclusion, of which "fasting blood glucose" and "postprandial blood glucose" are medical concepts. We reduce the values corresponding to similar or identical medical concepts to smaller categories and set the nodes "normal" and "abnormal" for the patient's laboratory tests. Based on [28], patients' ages were grouped according to three thresholds of 15, 30 and 64, set as four nodes of representation. Over 100 medications are extracted from EHRs, which are divided into 20 types according to different abbreviations, formulations and their effects, with each type of medication considered as a node. The comorbidity diagnoses of the patients matched the appropriate ICD-10 codes.
In the diabetes diagnosis task, gender is not the most informative patient attribute, which is not conducive to the prediction task and may cause the problem of "over smoothing." According to the conclusion of [4], the PM-GSL model disregards gender attributes. We construct the multi-relational graph using different relations between five types of nodes, "Patient, " "Age, " "Medication, " "Diagnoses" and " Laboratory Tests, " and determine semantically meaningful meta-paths. For example, two patients with blurred vision, "patient1→ metformin ← patient2, " indicate that two patients had the same medication, and two patients were likely to have the same diagnosis. The meta-paths in Table 1 guide the random walk of the patient multi-relational graph. We propose the PM-GSL, a diabetes clinical assistant diagnosis model for patient multi-relational graph structure learning, as shown in Figure 1. First, we analyze the graph connectivity based on the PM-GSL to jointly train and optimize the patient multi-relational graph structure and GNNs parameters. Second, a semantic embedding matrix is constructed based on the node embedding method of meta-paths. Considering rn as an example, we use the PM-GSL model to obtain a new overall feature graph ΓFrn and a higher-order semantic graph ΓSrn and fuse these two graphs with the original subgraph Arn to obtain a new subgraph A∗rn. Finally, the learned subgraphs are input into the GNN and regularizer to output the diagnosis and prediction of diabetes.
From the graph representation learning perspective, GNNs aim to learn node embedding by aggregating the information of neighboring nodes [29]. This iterative mechanism relies on the inherent information in the graph structure and attributes. However, artificially designed and constructed graphs have inevitable redundant noise and misinformation, which limits the predictive performance and interpretability of most GNNs, especially in smart healthcare, which requires higher graph quality. To optimize the patient multi-relational graph, the PM-GSL aims to fully use node features to enhance the original graph structure, which is mainly reflected in two aspects. First, we generate a feature similarity graph based on the similarity between node features. Then, node features are propagated to specific domains in the multi-relational graph to generate a message propagation graph, as shown in Figure 2.
4.2. Feature similarity graph
The pairwise connection of nodes is one of the most direct ways to represent information in the patient multi-relational graph. To preserve the pairwise proximity attribute between nodes in the patient multi-relational graph, the PM-GSL model determines the possibility of edges between two nodes with relation type rn∈R based on node features.
Specifically, for a node vi with node type φ(vi) and node eigenvector fi∈R1∗d, the node features are projected into the d∗-dimensional common space using a type-specific mapping matrix Wφ(vi), which is denoted as fi∗∈R1×d∗:
σ(⋅) denotes the non-linear activation function, W∈Rd×d∗ denotes the mapping matrix, and b∈R1×d∗ denotes the bias vector. In the patient multi-relational graph, for relation rn, we use weighted cosine similarity as the measurement function to measure node similarity and obtain a feature similarity graph ΓFSrn∈R|ϖh(rn)|×|ϖe(rn)|. The similarity between nodes vi and vj is expressed as follows:
where fvi∗ and fvj∗ are the node features after projection, and QFSrn is the learnable parameter matrix. After the calculation, the node feature similarity matrix ΓFSrn is generated, and is symmetric, and the element range of the matrix is [-1, 1]. Therefore, we must extract a non-negative and sparse matrix from ΓFSrn. Specifically, we defined a non-negative threshold λFS for automatic learning and set the elements that are smaller than λFS to 0. If λFS is larger, then ΓFSrn will be sparser, defined as follows:
4.3. Message propagation graph
The message propagation graph captures the interaction between the node features and topology, and it can find new effective edges more accurately. The key idea of the message propagation graph includes two aspects: First, many nodes of the same class have similar features and may be far apart in the feature space in the patient multi-relational graph structure data. Second, two nodes with similar node features are more likely to have similar neighbor nodes, and the feature similarity graph is propagated through the topology to generate new edges.
The node types corresponding to the two nodes that satisfy the relation rn in the patient multi-relational graph are ϖh(rn) and ϖe(rn), and the topology between them is Arn∈R|ϖh(rn)|×|ϖe(rn)|. For two nodes vt,vs∈ϖh(rn), we calculate the feature similarity:
Similar to Eq (4.3), a threshold λFH is set to control the sparsity of the feature similarity graph ΓFHrn, as shown in Eq (4.5). After obtaining the similarity of vt,vs∈ϖh(rn), a potential message propagation graph ΓCHrn∈R|ϖh(rn)|×|ϖe(rn)| is generated based on the feature similarity graph ΓFHrn and original subgraph Arn, as shown in Eq (4.6):
Similarly, for the same type of vp,vq∈ϖe(rn), we calculate Eqs (4.4) and (4.5) to obtain another corresponding feature similarity graph ΓFTrn and the message propagation graph ΓCTrn∈R|ϖh(rn)|×|ϖe(rn)|:
In summary, we fuse the generated feature similarity graph ΓFSrn and the message propagation graphs ΓCHrn and ΓCTrn through the channel attention layer [21] to obtain the overall feature graph ΓFrn∈R|ϖh(rn)|×|ϖe(rn)| of relation rn:
where [ΓFSrn,ΓCHrn,ΓCTrn]∈R|ϖh(rn)|×|ϖe(rn)|×3 denotes the stacking matrix of the candidate graphs that generate the overall feature graph. ΨFrn represents the channel attention layer, and the corresponding parameter is QFrn∈R1×1×3. The channel attention layer uses the function softmax(QFrn) to perform 1×1 convolution operations on the input. In this way, the PM-GSL model learns the different weights of the three candidate graphs of the overall feature graph respectively to measure the importance of each candidate graph to relation rn.
4.4. Higher-order semantic graph
The semantic graph relies on the higher-order topology of the heterogeneous information network, which aims to describe the multi-hop structural interaction between two nodes determined by meta-paths. In the patient multi-relational graph, the overall feature graph describes the nodes from a microscopic perspective, whereas the higher-order semantic graph describes the nodes from a macroscopic perspective. The higher-order relations reflected by different meta-paths contain different semantic information. We obtain the higher-order semantic graph structure of the patient multi-relational graph from multiple meta-paths, as shown in Figure 3. Given a sequence r1,r2,...,rn of the meta-path ϕ, a common method to generate a higher-order semantic graph is to multiply multiple adjacency matrices, such as the adjacency matrix Aϕ=Arn...Ar2Ar1. However, this method of obtaining high-order semantic graph by multiplying multiple adjacency matrices requires a considerable amount of time and space, and it discards the information of intermediate semantic nodes in the meta-paths.
In the patient multi-relational graph, a set of meta-paths {ϕ1,ϕ2,...,ϕm} exists. Based on metapath2vec [12], we obtained the meta-paths based node semantic embedding representation, Z={Zϕ1,Zϕ2,...,Zϕm}, to learn the potential higher-order semantic graph structure of the patient multi-relational graph. The entire training process adopts the meta-paths guidance random walk strategy and heterogeneous skip-gram model learning node embedding representation, which significantly retains the information of intermediate semantic nodes [30]. In addition, the training process of the metapath2vec model does not involve adjacency matrix multiplication, which reduces the time and space complexity.
After obtaining the node semantic embedding representation Z, for each meta-path ϕm in the patient multi-relational graph, we generate the adjacency matrix ΓMS∗rn,ϕm∈R|ϖh(rn)|×|ϖe(rn)|, for the semantic subgraph, and the calculation method for each edge is
where zmi denotes the i-th row of Zϕm. δMS∗rn,ϕm is a metric function with parameter QMS∗rn. We extend this calculation method to all meta-paths to generate m semantic subgraphs and aggregate them to obtain a higher-order semantic graph ΓSrn:
where [ΓMS∗rn,ϕ1,ΓMS∗rn,ϕ2,...,ΓMS∗rn,ϕm] is the stacking matrix of the corresponding m semantic subgraphs. ΨMS∗rn represents the channel attention layer, and the corresponding weight matrix QMS∗rn∈R1×1×m denotes the importance of semantic subgraphs based on different meta-paths of relation rn. After obtaining ΓSrn, we need to aggregate the learned overall feature graph ΓFrn and higher-order semantic graph ΓSrn, with the original patient subgraph Arn to generate a new relational subgraph A∗rn of relation rn:
Similarly, [ΓFrn,ΓSrn,Arn] is the stacking matrix of the generated new relational adjacency matrix. Weight Qrn∈R1×1×3 of candidate graphs is learned from the channel attention layer Ψrn, to evaluate the importance of the three candidate graphs in generating the relational subgraph A∗rn. Finally, we obtain a new patient multi-relational graph A∗={A∗rn,rn∈R} in the PM-GSL model.
4.5. Optimization
In the PM-GSL model, we combine the GNN parameters to optimize the new graph structure A∗. Considering that the adjacency matrix of the original heterogeneous graph may provide misleading information for the aggregation process, we aggregate the feature representation of neighbor nodes through the obtained patient multi-relational graph structure and apply it to diabetes clinical assistant diagnosis. For the patient multi-relational graph structure A∗, the forward model of the two-layer GCN can be expressed as follows:
where X∈RN×d is the original feature matrix of the node and the input to the first GCN layer. F′ is an adjacency matrix constructed using A∗. ˆA=˜D−12˜A˜D−12, ˜Dii=1+∑jFij′, ˜A=F′+IN, and IN are unit matrices. The classification loss on the graph is expressed as follows:
where Yvi is the prediction label, and l(⋅,⋅) indicates the cross-entropy loss function, which measures the error between the prediction label Yvi and the real label yvi.
The patient multi-relational graph constructed from multiple pieces of information in real-world EHRs is noisy and has task-independent edges. We propose the PM-GSL model, which makes the original patient multi-relational graph more adaptable to the prediction task but more prone to overfitting problems. Therefore, we impose sparsity constraints on the adjacency matrix F′ of the graph A∗ and apply the regularization term to the learned graph as follows:
The total loss of the PM-GSL model is described by Eq (4.15). Minimizing L can enable the joint optimization of the graph structure and GNN parameters to improve the prediction task performance.
5.
Experiment
We use two real EHRs datasets to evaluate the PM-GSL model: one from American hospitals and another from Chinese hospitals. Then, we compare the PM-GSL model with state-of-the-art methods. Specifically, we aim to answer the following four questions:
RQ1: Does the PM-GSL model improve the accuracy of a diabetes diagnosis?
RQ2: How important are the three types of candidate graphs in the PM-GSL model for generating new graph structures?
RQ3: Does PM-GSL adaptively assign greater channel attention values to important information?
RQ4: Is graph structure learning effective in improving the quality of the patient multi-relational graph?
5.1. Dataset
Our work is tested and analyzed by using two datasets. MIMIC-Ⅳ is a large international public EHRs dataset that collected clinical data of over 380,000 patients from 2008 to 2019. This dataset was collected at the Beth Israel Deaconess Medical Center in Boston, Massachusetts, USA, and included patient centered clinical records, such as demographics, vital sign measurements, nursing notes, and laboratory tests.
Furthermore, we also use the EHRs of a tertiary care hospital located in a major metropolitan center in northwestern China called P-EHRs. Although the amount and completeness of the data are relatively small, the information included in this study is sufficient to support our work. More specifically, there was 1) demographic information, such as age and sex. 2) Diagnosis: The combination group is defined according to the coding information of ICD-9 and ICD-10, mainly including 14 diseases, such as myocardial infarction, congestive heart failure, hyperlipidemia, hypertension, diabetes, and chronic lung disease. 3) Laboratory tests: We extracted the laboratory test indicators obtained for the first time and the minimum, mean, and maximum values of laboratory tests during hospitalization. 4) Vital signs: We selected the minimum, mean, and maximum values of the patient's vital signs during the first day of admission, which included heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, respiratory rate, and percutaneous oxygen saturation. 5) Medications: We collected data on drugs that were used more than 300 times in total during the patients' hospitalization and classified them into 20 categories according to the functions of these drugs. Over all, we briefly summarize in Tables 2 and 3.
5.2. Experimental settings and baselines
The experiment is based on Python 3.6.8 and TensorFlow 2.3.0. The calculation of a single model runs on NVIDIA RTX 6000 GPU. According to [30], in all the methods involved in this study, the node embedding dimension d and the common space dimension d∗ were set to 64 and 16, respectively. Equation (4.2) in Section 4 represents a 2-head cosine similarity measure function, that is, K = 2. The learning rate and weight decay are set to 0.01 and 0.0005, respectively. The other hyperparameters are adjusted by using the grid search method, which is reflected in λFS, λFH, λMS∗ and β for Eqs (4.3), (4.5), (4.9), and (4.14).
We compare the PM-GSL with six state-of-the-art baseline graph neural network models in recent years. They include two homogeneous graphs (GCN [7] and GAT [12]), two heterogeneous graphs (metapath2vec [13] and HAN [14]), and two graph structure learning methods (LDS [20] and Pro-GNN [21]). We conduct six ablation experiments to verify the effectiveness of the three candidate graphs in the PM-GSL for diabetes diagnosis.
5.3. Experimentation
5.3.1. RQ1: Overall comparison
In this section, we compare the overall performance of the PM-GSL and six baseline models, as shown in Figure 4. The classification process was repeated ten times to obtain the average value and ensure more stable and reliable prediction results. We use the evaluation metrics Macro-F1, Micro-F1, and AUC, which are commonly used in classification tasks.
The experimental findings are as follows:
(ⅰ) The PM-GSL model consistently outperforms the three types of baselines in predicting both EHRs datasets, which not only indicates that noisy data in the original EHRs prevent the GNN from aggregating effective feature information but also proves that PM-GSL can obtain a higher-quality heterogeneous graph structure.
(ⅱ) Compared with GCN and GAT, the performance of PM-GSL on the two EHRs datasets was significantly improved by 13.36–17.59%, which confirms that the fine-grained division of the patient multi-relational graph is extremely conducive to the diabetes diagnosis.
(ⅲ) Although the metapath2vec and HAN models can be used to analyze heterogeneous relations, the prediction performance of these two models is 9.49–14.39% lower than that of the PM-GSL model proposed in this study. This is because the density of neighbor nodes in each relation in the patient multi relation graph is high. In particular, the HAN model directly uses the original graph structure as the input and cannot effectively filter out neighbor nodes with interference factors, which causes difficulties in diabetes diagnosis.
(ⅳ) Compared with the LDS model and Pro-GNN model, the PM-GSL model proposed in this study can improve the prediction performance by 7.26–9.00%. In particular, the PM-GSL model can calculate node similarity and message propagation path and can make full use of multiple patient relations, which is helpful for learning better graph structure and more robust GNN parameters; thus, the new patient multi-relational graph has a stronger ability to adapt to prediction tasks.
5.3.2. RQ2: Ablation study
To further verify the impact of the three core candidate graphs in the PM-GSL model, we designed three variants of the PM-GSL model by deleting any type of candidate graphs, which are respectively represented as PM-GSLFS, PM-GSLHT and PM-GSLS*. Table 4 shows the results of these variants in terms of the Macro-F1, Micro-F1 and AUC, and we can observe the following:
(ⅰ) The PM-GSL model outperforms the three ablation studies on diabetes diagnostic task, indicating the necessity of all candidate graphs in generating new graph structures.
(ⅱ) Compared with the other two ablation studies, PM-GSLS* significantly decreased in the three evaluation indicators, because the semantic information between different relations is more important in the patient multi-relational graph.
(ⅲ) To verify the effectiveness of the candidate graph weights learned by the PM-GSL, we replace the channel attention layer in Eq (4.11) with an average aggregation layer that treats each type of candidate graph equally. The effect of the channel attention layer is better than that of the PM- GSLavg. If on average, three core candidate graphs are fused in the PM-GSL model, the influence of higher-order semantic information in the patient multi-relational graph will be weakened, and the prediction performance will be reduced.
5.3.3. RQ3: Weight analysis of candidate graphs
We analyze the distribution of the channel attention weight, further studied the ability of the PM-GSL model to distinguish the importance of three core candidate graphs and fuse the three core candidate graphs to generate a new patient multi-relational subgraph. We trained the PM-GSL model 50 times and set all thresholds to 0.2. The distribution of channel attention values is shown in Figure 5.
Specifically, the original subgraph is an important structure for diabetes prediction. However, for P-T, the PM-GSL model assigns a large channel attention value to the learned higher-order semantic graph, indicating that the information in the higher-order semantic graph is more important. The above phenomenon is consistent with the conclusion (ⅲ) from the ablation experiment, which further proves that the PM-GSL model can adaptively provide more channel attention to important information.
5.3.4. RQ4: Parameter sensitivity analysis
The similarity thresholds defined in Eqs (4.3), (4.5) and (4.9) can be used to control the sparsity of generated graphs. To better compare the experimental results, we set up λFS = λFH and conduct experiments on different λFS and λMS∗. Figure 6 shows the Macro-F1 and AUC values in the comparison experiment. On the two datasets, when both λFS = λFH and λMS∗ are set to 1, the prediction performance of the PM-GSL model is significantly reduced. This is because the PM-GSL model uses only the original graph structure of the patient multi-relational graph, which is similar to the general GNN model. This finding also proves the effectiveness of graph structure learning in improving the quality of the patient's multi-relational graph.
6.
Conclusions
In this study, we attempt to incorporate heterogeneous graph structure learning into the task of diabetes clinical assistant diagnosis and propose a model, PM-GSL, which jointly learns patient multi-relational graph structures and GNN parameters for diabetes prediction. In particular, we use the effective information in EHRs to build a patient multi-relational graph to simulate the complex interactions between multiple medical entities in EHRs. The patient multi-relational graphs propagate the node characteristics through the topology to generate the underlying graph structure and describe the nodes from micro and macro perspectives, respectively. We designed three fine candidate graphs and fused them to generate clean and effective heterogeneous graphs to solve the weak structure problem of the EHRs.
Numerous experiments have proven the effectiveness of the PM-GSL model, which well proves that graph structure learning is helpful to improve the quality of patient multi-relational graph and the accuracy of disease prediction tasks. Our proposed PM-GSL model achieved a 9.62% improvement in Macro-F1, 9.17% improvement in Micro-F1 and 10.33% in AUC compared to the baselines. In the future, we aim to extend PM-GSL to the multi-view model and explicitly integrate the label information into graph structure learning to improve the performance of disease clinical assistant diagnosis and enhance the clinical interpretability of the model.
Acknowledgments
We would like to thank the anonymous reviewers for their valuable comments. This research was funded by the National Natural Science Foundation of China (No. 61863032), Northwest Normal University Major Research Project Incubation Program, China (No. NWNU-LKZD2021-06).
Conflict of interest
The authors declare there is no conflict of interest.