Research article

Enhancing traditional Chinese medicine diagnostics: Integrating ontological knowledge for multi-label symptom entity classification


  • In traditional Chinese medicine (TCM), artificial intelligence (AI)-assisted syndrome differentiation and disease diagnoses primarily confront the challenges of accurate symptom identification and classification. This study introduces a multi-label entity extraction model grounded in TCM symptom ontology, specifically designed to address the limitations of existing entity recognition models characterized by limited label spaces and an insufficient integration of domain knowledge. This model synergizes a knowledge graph with the TCM symptom ontology framework to facilitate a standardized symptom classification system and enrich it with domain-specific knowledge. It innovatively merges the conventional bidirectional encoder representations from transformers (BERT) + bidirectional long short-term memory (Bi-LSTM) + conditional random fields (CRF) entity recognition methodology with a multi-label classification strategy, thereby adeptly navigating the intricate label interdependencies in the textual data. Introducing a multi-associative feature fusion module is a significant advancement, thereby enabling the extraction of pivotal entity features while discerning the interrelations among diverse categorical labels. The experimental outcomes affirm the model's superior performance in multi-label symptom extraction and substantially elevates the efficiency and accuracy. This advancement robustly underpins research in TCM syndrome differentiation and disease diagnoses.

    Citation: Hangle Hu, Chunlei Cheng, Qing Ye, Lin Peng, Youzhi Shen. Enhancing traditional Chinese medicine diagnostics: Integrating ontological knowledge for multi-label symptom entity classification[J]. Mathematical Biosciences and Engineering, 2024, 21(1): 369-391. doi: 10.3934/mbe.2024017

    Related Papers:

    [1] Bo Sun, Ming Wei, Binbin Jing . Optimal model for the aircraft arrival and departure scheduling problem with fuzzy runway incursion time. Mathematical Biosciences and Engineering, 2021, 18(5): 6724-6738. doi: 10.3934/mbe.2021334
    [2] Colella Ylenia, De Lauri Chiara, Improta Giovanni, Rossano Lucia, Vecchione Donatella, Spinosa Tiziana, Giordano Vincenzo, Verdoliva Ciro, Santini Stefania . A Clinical Decision Support System based on fuzzy rules and classification algorithms for monitoring the physiological parameters of type-2 diabetic patients. Mathematical Biosciences and Engineering, 2021, 18(3): 2654-2674. doi: 10.3934/mbe.2021135
    [3] Xi-yue Cao, Chao Wang, Biao Wang, Zhen-xue He . A method to calculate the number of dynamic HDFS copies based on file access popularity. Mathematical Biosciences and Engineering, 2022, 19(12): 12212-12231. doi: 10.3934/mbe.2022568
    [4] Yanfeng Shi, Shuo Qiu, Jiqiang Liu, Tinghuai Ma . Novel efficient lattice-based IBE schemes with CPK for fog computing. Mathematical Biosciences and Engineering, 2020, 17(6): 8105-8122. doi: 10.3934/mbe.2020411
    [5] Xiaomei Feng, Jing Chen, Kai Wang, Lei Wang, Fengqin Zhang, Zhen Jin, Lan Zou, Xia Wang . Phase-adjusted estimation of the COVID-19 outbreak in South Korea under multi-source data and adjustment measures: a modelling study. Mathematical Biosciences and Engineering, 2020, 17(4): 3637-3648. doi: 10.3934/mbe.2020205
    [6] Mohamed Abbas, Ali Algahtani, Amir Kessentini, Hassen Loukil, Muneer Parayangat, Thafasal Ijyas, Abdul Wase Mohammed . Qualitative theoretical modeling to study the possibility of detecting multi-virus in blood flow using Nano-quartz crystal microbalance. Mathematical Biosciences and Engineering, 2020, 17(5): 4563-4577. doi: 10.3934/mbe.2020252
    [7] Alejandro Rincón, Fredy E. Hoyos, John E. Candelo-Becerra . Comparison, validation and improvement of empirical soil moisture models for conditions in Colombia. Mathematical Biosciences and Engineering, 2023, 20(10): 17747-17782. doi: 10.3934/mbe.2023789
    [8] Yunxia Wei, Yuanfei Zhang, Bin Hang . Construction and management of smart campus: Anti-disturbance control of flexible manipulator based on PDE modeling. Mathematical Biosciences and Engineering, 2023, 20(8): 14327-14352. doi: 10.3934/mbe.2023641
    [9] Bin Hang, Beibei Su, Weiwei Deng . Adaptive sliding mode fault-tolerant attitude control for flexible satellites based on T-S fuzzy disturbance modeling. Mathematical Biosciences and Engineering, 2023, 20(7): 12700-12717. doi: 10.3934/mbe.2023566
    [10] Lan Huang, Shuyu Guo, Ye Wang, Shang Wang, Qiubo Chu, Lu Li, Tian Bai . Attention based residual network for medicinal fungi near infrared spectroscopy analysis. Mathematical Biosciences and Engineering, 2019, 16(4): 3003-3017. doi: 10.3934/mbe.2019149
  • In traditional Chinese medicine (TCM), artificial intelligence (AI)-assisted syndrome differentiation and disease diagnoses primarily confront the challenges of accurate symptom identification and classification. This study introduces a multi-label entity extraction model grounded in TCM symptom ontology, specifically designed to address the limitations of existing entity recognition models characterized by limited label spaces and an insufficient integration of domain knowledge. This model synergizes a knowledge graph with the TCM symptom ontology framework to facilitate a standardized symptom classification system and enrich it with domain-specific knowledge. It innovatively merges the conventional bidirectional encoder representations from transformers (BERT) + bidirectional long short-term memory (Bi-LSTM) + conditional random fields (CRF) entity recognition methodology with a multi-label classification strategy, thereby adeptly navigating the intricate label interdependencies in the textual data. Introducing a multi-associative feature fusion module is a significant advancement, thereby enabling the extraction of pivotal entity features while discerning the interrelations among diverse categorical labels. The experimental outcomes affirm the model's superior performance in multi-label symptom extraction and substantially elevates the efficiency and accuracy. This advancement robustly underpins research in TCM syndrome differentiation and disease diagnoses.



    Familywise error rate (FWER) control is necessary when a set of multiple inferences are simultaneously evaluated and investigators want to obtain a small set of results that warrant further examination. When the test endpoints are correlated, it is preferable to use a procedure that takes into account the dependency structure in order to improve power. This can be done easily for multivariate normal outcomes, but an accessible and powerful method is not available for multiple chi-square endpoints even though this is a common set of test endpoints encountered in biomedical research contexts. For example, generalized linear models may be used to model mortality using a large predictor variable set. Additionally, multiple gene mutations may be simultaneously assessed for association with a particular disease outcome. Both of these common research contexts would warrant use of a multiplicity control that can account for dependency in an efficient and effective manner. In particular, in high-dimensional data settings, use of multiple chi-square tests is quite common, but the use of, and need for, multiplicity adjustments may be unclear to many practitioners. However, either misusing or omitting multiplicity adjustments in these settings affects the reproducibility of research results, and can easily lead to spurious statistical significance declarations.

    The purpose of this manuscript is to present a flexible, closed-form method for controlling the family-wise error rate for high-dimensional chi-square endpoints utilized in simultaneous testing or intervals. This is a critical setting for strict multiplicity control of type I error since false positives among large sets of associations can easily produce research results that are ultimately non-reproducible. Additionally, this procedure is adaptive to any positive definite correlation structure, thus is flexible to most any correlation structure encountered in practice. The method may be used to construct simultaneous intervals that control type I error and provides information about the practical significance of the test results. Note that the Bonferroni adjustment is currently the only easily accessible method for this general case. The proposed procedure, however, is theoretically and empirically shown to be less conservative than Bonferroni and easily attains FWER control even with complex correlation structures among the test endpoints. This proposed method may be used to complement data reduction methods to achieve reliable, reproducible and powerful research results when many chi-square test endpoints are simultaneously evaluated. The application in the conclusion of this manuscript illustrates one such use. Additionally, the proposed method is fast, simple, and completely general. Thus, it may be utilized for settings where the chi-square endpoints are correlated or uncorrelated for both high and low dimensions.

    While resampling or simulation-based methods are a viable alternative for some scenarios, these methods can be difficult to implement in practice. This is particularly true for high-dimensional settings where resampling or simulation-based methods can be time consuming. The strength of resampling and simulation-based methods is that they provide a joint, rather than margin-based, solution. Two accessible options for implementing resampling-based multiplicity control in R [1] are the multtest [2], multcomp [3], and SiMaFlex [4] packages. Using a resampling or simulation-based approach is a gold standard, but may not be practical in every setting or circumstance.

    Our goal is to provide a method that is simple to implement, even for practitioners with little statistical programming experience, but that still provides close to optimal family-wise error rate control for most settings. (We note that in some of these areas multiplicity adjustments of any kind, although warranted, are often omitted, so a method that is simple to utilize may be easier to promote.) Additionally, many data-intensive applications could require computation of a large number of critical points, justifying the need for a more efficient approach. This would be warranted if multiple independent subsets of test endpoints are selected for separate analysis. As the simulations demonstrate, the proposed critical values are often close to the resampling or simulation-based critical values when the correlations between endpoints is moderate to low. Thus, we recommend the proposed method for settings when a fast or simple method is warranted. However, resampling or simulation-based methods may be preferred for other scenarios.

    In review, the purpose of this manuscript is to propose an easily implemented approach for controlling the familywise error rate for many chi-square distributed test endpoints. The remainder of the manuscript is organized as follows: in Section 2 we provide a review of probability-based multiplicity adjustments, in Section 3 we describe the proposed methodology, in Section 4 we present the simulation results, and in Section 5 we discuss implications for research and practice.

    When conducting multiple tests of association, there are many approaches a practitioner may take. First, control of the familywise type I error rate (FWER) or the false discovery rate (FDR) are both reasonable approaches. Generally, the choice of which error rate to control depends on the nature of the study. Confirmatory studies that require overall conclusions involving multiple parameters generally utilize FWER control, while exploratory studies with many possible signals more often utilize FDR control. Control of the FDR, as first proposed by [5], is widely used for studies involving a large number of association tests. The FDR methodology is flexible as it is applicable to cases where the test statistics are dependent [6]. When there are a large number of test statistics and results are exploratory, rather than confirmatory, in nature, controlling the FDR for tests of association is very reasonable and effective. Many have worked on improving the performance of the FDR method with a focus on association tests.

    In addition to the type of error control of interest, practitioners must consider whether the data warrants marginal or joint multiplicity control. Marginal control implies that the test statistics are assumed independent and, thus, the rejection regions are selected based only on the marginal test statistics. In contrast, joint control methods incorporate a dependency structure into the multiplicity control and the rejection region is defined based on a joint distribution with a particular dependency structure. In cases where the dependencies among the test statistics are substantial, a joint procedure is usually preferred. A related issue for consideration is also whether the multiplicity control uses a single-step or stepwise approach. A single-step approach provides a rejection region that is independent of the full set of test statistics while, in contrast, a stepwise approach takes into account the result of other test statistics in the family of inference. For stepwise approaches, testing the null hypothesis while taking into account the outcomes of related tests informs the null distribution used to define the rejection region and can lead to more powerful results.

    While FDR control is advantageous in many settings, for a small set of inferences, controlling type I error provides a confirmatory set of inferences. Strict type I error control can often be achieved with reasonable power and minimal assumptions by utilizing the Bonferroni procedure. However, when the set of comparisons is large, this procedure quickly becomes too conservative, warranting more powerful bounds. When a multivariate normal distribution is an appropriate assumption, there are many methods that utilize arbitrary correlation structures to provide increased power. Some have made use of a minimal spanning tree approach ([7]; [11]) and others use tube approximation algorithms to achieve increased power ([8]; [9]; [10]). Additionally, many authors have considered alternatives for non-normal correlated endpoints, such as binary outcomes ([12]), non-normal continuous endpoints([13]; [14]), a combination of test endpoints ([15]; [16]), or generally discrete outcomes ([17]). When the distribution is specifically multivariate chi-square, however, the options are more limited. Resampling-based methods provide one alternative. Westfall & Tobias [18] and Westfall & Troendle [19] provide some resampling-based options. However, although these resampling-based procedures deliver strong solutions in almost any setting, they can be computationally demanding ([20], p.3) and computation of interval-based inferences is not always supported.

    Simulation of critical values is also a reasonable approach, but implementation is often far from direct, and can again be computationally intensive when considering large sets of inference. Stange et al. [21] provides another alternative that is closely related to the proposed procedure. This approach can be applied to multiple test endpoints with a multivariate chi-square distribution for situations with a low factorial correlation structure (defined in Stange et al.). This method requires a three-step process: 1) evaluation of the correlation matrix to ensure it is m-factorial, 2) numerical integration of often high-dimensional chi-square probability functions, and 3) approximation of a series to complete the multivariate probability calculations. For simple, one-factorial correlation matrices, this proposed method is efficient and highly accurate. However, the Stange method is computationally demanding and can exhibit significant approximation errors for moderately to highly complex correlation structures, is not applicable to all correlation structures, and is only available in MATLAB environments. In contrast, the proposed method may be implemented in R [1], applies to all correlation structures, is more computationally efficient than resampling or simulation-based procedures, Stange et al., or direct simulation, and is often just as accurate, although there are situations where it can behave conservatively.

    A comprehensive overview and framework of approaches for error control in association testing is presented in [22]. While all of the above described considerations needs to be well thought out for each family of inference, we focus just on one particular type of multiplicity control. In particular, for this paper, we present a joint single-step quantile-based procedure that will control type I error and is flexible to any positive definite correlation structure derived from the test statistics.

    Recall that our goal is to provide a fast and simple probability point that allows evaluation of simultaneous tests or calculation of simultaneous intervals for a group of m test endpoints that follow a multivariate chi-square distribution with ν degrees of freedom and any set of off-diagonal bivariate correlations, Corr(Yi,Yj)=ρij. A conservative solution to this probability can be achieved by utilizing the probability inequality that provides the rationale for the Hunter-Worsley method for multivariate normal or multivariate-t distributional settings. Suppose that we wish to perform m simultaneous tests based on test statistics, Yi, i = 1,,m, where the joint distribution is the multivariate chi-square distribution, and where large values of Yi, i = 1,,m are in favor of the alternatives. For this purpose, we denote the set of null hypotheses by {H1,H2,,Hm} where each Hi is a true null hypothesis test for a test statistic Yi using critical point c and spanning 1,,m. Then one solution to the multiplicity problem of interest would involve finding a probability point c where for Ai={|Yi|>c}, P(mi=1Ai)α and α is the desired type I error rate. (Adjustments for other common situations are easily made.) Thus, if we find a point c where the right-hand side of Equation (3.1) is equal to α, then we have a conservative solution to the problem. This is accomplished by finding an upper (or exact) bound for P(mi=1Ai) where Ai={|Yi|>c} denotes that an error has occurred with the results of the ith hypothesis test or confidence interval under the null. In order to bound this probability, the following inequality is useful,

    P(mi=1Ai)ki=1P(Ai)(i,j)τP(AiAj). (3.1)

    Here τ is a non-cyclical tree or subgraph of G, a family of graphs formed from the set of nodes Ai with branches given by the intersections AiAj. While this upper bound is applicable for any tree, utilizing the minimal spanning tree will yield the lowest upper bound with regards to the tree. Details on applying this tree to the multivariate chi-square setting are provided below.

    Note that application of the above bound is computationally efficient since it only requires calculation of univariate and bivariate probabilities. Moreover, the proposed algorithm is guaranteed to improve upon the Bonferroni bound in all cases except when the events are mutually exclusive, where it is equivalent to the Bonferroni bound. We note that using this procedure requires a covariance matrix with a stable estimator. In contrast to Strange et al, users will not need to evaluate whether the covariance estimate is m-factorial. Instead, for any application with a large sample size, users only need to obtain a consistent estimator for the covariance matrix. For the bivariate chi-square distribution used in the proposed method, the bivariate probability functions are increasing in ρij for all i,j=1,,m with ij. Thus, for these distributions, the bound given in Equation (3.1) allows for a single evaluation of the optimal τ that will maximize the second term in the left-hand side of Equation (3.1) for all possible c values. This is a significant savings in computational intensity, as evaluating the left-hand side for various values of c does not require determining a new τ. However, evaluation of the right-hand side of Equation (3.1) will require calculation of bivariate chi-square probabilities. With regards to calculation, the algorithm utilized for approximating the quantiles from the bivariate chi-square distribution will produce reliable estimates to within five decimal places until the magnitude of the pairwise correlation exceeds 0.95 [23]. Results for the estimation of the bivariate chi-square quantiles were evaluated prior to application of the spanning tree algorithm to the above probability distribution functions. A more detailed description of the bivariate distribution function and an algorithm to find the minimal spanning tree for this situation are provided below.

    Consider two ν-dimensional multivariate standard normal random vectors, Zi=(Zi1,,Ziν) for i=1,2 with ν non-zero canonical correlations between Z1j and Z2j. Then Yi=ν(j=1)Z2ij,i=1,2, are distributed as canonically correlated chi-square random variables with ν degrees of freedom. Krishnaiah (1980) derived the joint chi-square density function for (Y1,Y2). Using this density, the (1α) quantile c, i.e., P(Y1c,Y2c)=1α, may be calculated using numerical integration. We will utilize the cumulative distribution function given by the following, where we assume that c1 = c2 = c, gamma is the incomplete gamma function, c=c(2(1ρ212)) for i = 1, 2, and Corr(Y1,Y2)=ρ12.

    P(Y1c1,Y2c)=(1ρ212)(ν/2)(j=0)1Γ(ν/2+j)Γ(j+1)ρ2j12γ(ν/2+i,c)γ(ν/2+i,c) (3.2)

    When the Yi do not have the same degrees of freedom, an alternative version of the distribution function is available [24]. For the purposes of the proposed Hunter-Worsley approximation, the above density and algorithm for numerically approximating the quantiles were coded in R [1]. In the code, the sum in Equation (3.2) was truncated after 150 terms and the secant method was used to find the desired quantile of the bivariate distribution function. All calculations in R matched the quantiles given in [23] to within five decimal places, hence reasonable accuracy and precision was achieved.

    It is important to note that, for the chi-square distribution function given in [24], P(Y1c1,Y2c2) depends on i and j only through ρij and is non-decreasing in |ρij|. This can be proven analytically by taking the partial derivative of Equation (3.2) with respect to ρij. Given these properties, the distribution in Equation (3.2) may be utilized to approximate any set of intersection probabilities in the same manner utilized by both [7] and [11] for the t and normal distributions, i.e., the spanning tree need only be obtained once.

    By utilizing Equation (3.1), a conservative multiplicity-adjusted critical value appropriate in chi-square settings may be obtained. As discussed previously, the branches of τ may be solved by utilizing the maximal spanning tree algorithm of [25] where the edge weights are the branches ρij. The correlation, ρij, is the appropriate edge weight when utilizing the maximal spanning tree algorithm since the P(Y1c,Y2c) is nondecreasing in ρij. In particular, this set of branches will maximize the right-hand side of Equation (3.1) for any positive value c. Using the adjusted critical value c often results in considerable gains in computational efficiency, particularly since simulation or resampling-based approaches would require a non-efficient recalculation of the critical point for each set of comparisons. The algorithm for obtaining τ is as follows:

    1.Choose any Ai from the set of unconnected nodes

    2.Find the largest ρij so that Ai is an unconnected node and Aj is a connected node and join these nodes

    3.Repeat steps 1) and 2) until no node is unconnected and there are no cycles.

    Ultimately, a root-finding algorithm (e.g., secant method) can be utilized to obtain a constant c such that the right hand side of Equation (3.1) is approximately α. For the secant method, the convergence criterion for determining the critical point c was 1x105. We note that the CPU time for the proposed procedure is quite low, even for large sets of comparisons. For example, running R 4.0.0 on a Windows-based PC for m = 10,000 endpoints, the proposed procedure required 7.97 seconds of elapsed time, while the simulated critical point, using 100,000 simulated data sets to approximate the critical value, required 149.7 seconds of elapsed time. Clearly, this proposed procedure delivers considerable computational efficiency when computing even a single critical point. However, when practitioners identify multiple mutually independent sets of inference, the proposed procedure is even more helpful, due to its accessibility and greatly decreased computational cost. In addition to advantages with regards to the computational cost, the critical point obtained using this procedure will always be less than or equal to the Bonferroni adjusted critical value and is almost always nearly identical to the simulated critical point, thus providing near optimal power for the inferences.

    In order to assess the performance of the derived critical point, the empirical family-wise error rate (eFWER) and relative efficiencies (RE) are examined for various settings. The REs, the ratio of the square of the Bonferroni and the Hunter-Worsley chi-square critical points, are reported for various correlation structures and values of m and ν. In order to facilitate comparison of a simulation-based critical value and the proposed critical point, REs are also computed to compare these values. Following the RE results, the eFWER is reported for these same settings. Both the Hunter-Worsley and Bonferroni critical points are always guaranteed to be conservative and are easily accessible options for multiplicity control in this general setting, but also require very little input from the end user. Consequently, primary interest focuses on the REs, rather than the eFWERs of the two critical points. However, in order to compare the error rates of the proposed procedure to a simulation-based method and commonly utilized marginal multiplicity control methods (Holm, Hochberg, and Sidak), the eFWER is reported for each of these four additional procedures.

    For the RE and eFWER evaluations, varied types of covariance structures are considered: an AR(1) structure, a compound symmetric structure and a block diagonal structure. These structures were considered due to their explanatory simplicity and relevance to applied settings. For example, a compound symmetric structure pertains to a case where a latent trait of a condition is determining the presence of a set of gene mutations. In this case, if any single gene mutation is activated, it increases the probability of a subset of these gene mutations as well. This may hold only for a subset of mutations, in which case, a block diagonal correlation structure is appropriate. Finally, an auto-regressive structure applies to cases where the latent variable decays over time across features. This is also a common scenario encountered in practice.

    Data is generated for these specific correlation structures and, under the assumption of the null hypothesis, tested to see how many simulated data sets result in at least one type I error. We investigate sample sizes of n = 100,500, and 1000, test endpoint numbers of m= 500 and 1000, for each of the three correlation structures (compound symmetric, autoregressive, and block diagonal structures). For each correlation matrix, ρ is set in the following manner: for a compound symmetric structure, ρii=1 and the off-diagonals are ρij=ρ, an AR(1) structure, where the diagonal elements are ρii=1 and the off-diagonals are ρij=ρ|ij|, respectively, for i,j=1,,m, ij, and a block diagonal structure where subsets of off-diagonal elements are ρ and the diagonal elements are 1. For all settings, the values ρ = 0, 0.3, 0.5, and 0.7 are considered.

    The correlated multivariate chi-square random variables are generated using the following algorithm. First, using a given correlation matrix (V), the eigenvalues (l) and eigenvectors (U) are extracted from the structure. Then using the eigenvalue diagonal matrix L=diag(l) and eigenvector matrix U, Y=[UX][UX] is a multivariate set of correlated chi-square random variables provided that X is a multivariate normal set of centered random variables with correlation structure L. Simulations confirm the correlation structure and mean were accurately approximated for each setting investigated to within an error of 0.001. In all cases, 10,000 replications are performed. This ensures that the eFWER is estimated accurately to within 0.001. Simulations confirm the correlation structure and mean were accurately approximated for each setting investigated to within an error of 0.001. In all cases, 1000 replications are performed. This ensures that the eFWER is estimated accurately to within 0.001. All simulations were performed in R [1].

    In all of the results presented, line plots are utilized for ease of viewing. However, these plots are not meant to imply that the statistic is linear between the respective plotting points.

    For simulating the multivariate chi-square data, the correlation structures are fixed and are also used as a basis for later random generation. Each of these correlation structures are investigated for m = 500, 1000 and for ν = 1. Other degrees of freedom (ν= 10 and 30) were also investigated with very similar results to when ν =1 and are omitted from presentation. Whenever REs compare the proposed point to the Bonferroni or simulated critical points, the RE is labeled b.h and s.h, respectively.

    Figure 1 displays the RE values comparing the Hunter-Worsley chi-square critical values to the Bonferroni and simulation-based critical values. These are plotted against the ρ used to generate the correlation structure with the line type indicating the correlation structure (compound symmetric (CS), autoregressive (AR1) or block diagonal (BD)). The plots are stratified by the number of endpoints (m = 500 and 1000) and sample size (n = 500 and 1000). In general, as the magnitude of ρ and the number of comparisons increase, the RE for Bonferroni (b.h) also increases. There is also an increase in the RE values for Bonferroni as the degrees of freedom increases. The RE for Bonferroni gains ranged anywhere from 0% to 11% (when ρ = 0.7), with the largest gains in efficiency consistently occurring when ρ is greater than 0.3 in magnitude. Whenever ρ is very close to zero, the RE for Bonferroni gain is also close to 0, indicating a negligible gain in using the Hunter-Worsley as opposed to the Bonferroni critical point. These general Bonferroni results hold for the AR(1), block diagonal (BD) and compound symmetric (CS) structures. With regard to the simulated critical point comparisons, there are two cases where the critical point using simulation has a clear advantage. When there is a block diagonal (BD) structure at lower levels of dependency (ρ<0.3), there is a slight improvement with using the simulation-based critical point. Also, for the compound symmetric structure, the simulation-based critical point provides far more power than the proposed critical point. However, for the AR(1) and BD structures, the simulated and proposed critical points have similar values and, hence, the REs are close to 1.

    Figure 1.  Mean relative efficiencies (RE) plotted against ρ with m = 500 and 1000, sample sizes n= 100,500 and 1000, and line type indicating correlation structures block diagonal (BD), compound symmetric (CS), and autoregressive (AR1).

    In order to better simulate real-world covariance structures, we also use random correlation structures to see how varying levels of noise in the generation of the correlation structures affects performance. In the following, whenever correlation matrices are randomly generated, 100,000 simulated values are utilized to simulate the correlation matrix. In order to accomplish this for a specific correlation matrix (of types AR(1), BD or CS), a Wishart random variable is generated in R using the rWishart function [1] with either νW = 500 or 1000 degrees of freedom for the Wishart distribution, creating high and low amounts of noise, respectively, in the correlation matrix. However, in the following, only the results using 1000 degrees of freedom are presented since the results were almost identical for both values of νW.

    For each iteration of the simulations, a unique randomly generated correlation matrix is produced, which, correspondingly, results in a unique RE for comparing the proposed critical point to the Bonferroni and simulated critical points (labeled b.h and s.h in the plots) for that specific correlation matrix. Consequently, the RE values presented in Figures 2 and 4 are mean values. We note that the observed correlation matrices include some negative off-diagonal elements, but were positive definite and no problems arose randomly generating Wishart random variables based on these. In all the figures presented in the remaining sections, only the results for the randomly generated Wishart correlation matrices with a high level of variation are displayed, since these results are very similar to those assuming a low amount of variation. Figure 2 shows the mean RE values (b.h and s.h) for the AR(1), CS and BD structures where Wishart random variables with high noise are used to provide random correlation matrices. In general, these display a very similar pattern to the results when a fixed correlation structure is employed, as the RE increases with the number of comparisons and ρ, with the exception of the CS structure when comparing the simulation-based critical point to the proposed critical point. With the exception of this case, the RE values ranged from 0% to 12%. The results are very similar to the fixed correlation structure results and demonstrate that the method is stable even in the presence of high noise.

    Figure 2.  Mean relative efficiencies (RE) plotted against ρ with m = 500 and 1000, sample sizes n= 100,500 and 1000, and line type indicating random correlation structures block diagonal (BD), compound symmetric (CS), and autoregressive (AR1).
    Figure 3.  Empirical family-wise error rates (eFWER) plotted against ρ with m = 500 and 1000, for fixed correlation matrices (BD, AR(1) and CS), line type indicating method.
    Figure 4.  Mean empirical family-wise error rates (eFWER) plotted against degrees of freedom with randomly generated correlation matrices (CS, BD and AR(1)) and line type indicating method.

    In this section, the empirical family-wise error rates (eFWERs) are examined for each of the previously described data settings. The eFWER is defined to be the proportion of times that at least one of the Yi, (i = 1, ,m) exceeds the critical value. Since it is known that the proposed critical point will be conservative, the eFWERs are assessed in order to gauge how conservative the proposed critical points are in varied settings. In order to evaluate the eFWER, the following procedure is utilized for generating correlated chi-square distributed endpoints. Data is generated to conform to a particular fixed correlation structure, degrees of freedom, and number of comparisons of interest. The generated vector of correlated chi-square responses was then evaluated for significance using the Hunter-Worsley chi-square critical point as well as the Bonferroni-adjusted, simulation-based, Holm, Hochberg, and Sidak based critical points.

    For 10,000 simulated data sets, we expect a procedure with a 5% FWER to be within 0.004 of 0.05. Figure 3 presents the eFWER for the fixed AR(1), BD and CS correlation structures across various ρ and m values. The simulations confirm that the Hunter-Worsley eFWER never exceeds a margin of error of 5%. However, there are instances where the Hunter-Worsley eFWER fell below the margin of error for 5% FWER, e.g. behaved conservatively. This occurs when ρ> 0.5 and, as stated, the Bonferroni method always produces an even lower eFWER in each case. In general, higher correlations result in lower eFWERs. However, as confirmed by the RE results, the largest gain in using the Hunter-Worsley critical point over the Bonferroni critical point occurs at these higher correlation settings. We observed the distribution of empirical error rates for the simulations and found that the proposed Hunter-Worsley method is on average closer to the desired error rate of 5% without becoming liberal.

    At these higher correlation settings, it is also of interest to consider the use of a simulated critical point. Simulations indicate that the proposed Hunter-Worsley approximation performs similarly to the simulated critical point at low levels of correlation (ρ<0.5) and for BD and AR(1) correlation structures. However, note that the marginal (Holm, Hochberg and Sidak) and simulated procedures maintain more power for CS structures at higher (ρ>0.5) correlations. Simulations confirm that the proposed method is always within the error anticipated for the empirical error rates except for these cases of the CS structure (Note : We identify empirical error rates as differing from whenever they exceed the expected sampling error (2 x ((0.050.95)/1000) = 0.014). We note that only 10,000 simulated critical values were utilized for comparing the simulated critical point to the proposed critical point due to the computational burden of the evaluation. Moreover, in order to achieve similar results for the simulated critical value, we had to compute 100,000 distributions from which the maximum is retained to find the simulated critical point. This is computationally intensive and not a straightforward analysis that would be accessible to practitioners. In all cases, use of the proposed Hunter-Worsley critical point is justified since it is computationally less demanding and results in empirical error rates that are within the expected sampling error limits. Moreover, since this method is available as an R function, it is accessible to end users with only an assumed overall error rate, an exact or estimated correlation or covariance matrix, and a set of planned comparisons. This analysis would fit well within existing computing structures for linear and generalized linear models in R and could easily be streamlined within a project workflow. Hence, given the evidence from these simulations that our proposed efficient critical point performs similarly to simulated values for BD and AR(1) structures, we suggest using our proposed method in place of the simulated point.

    When evaluating the randomly generated CS, BD and AR(1) structures, the eFWERs have a similar pattern to that observed for the fixed correlation structures. Hence, these results are not provided in a figure. However, Figure 5 displays the mean eFWERs when using the Hunter-Worsley and Bonferroni critical points for the three correlation matrices plotted against the degrees of freedom for the chi-square endpoints. The proposed critical point is always less conservative than the Bonferroni method and still controls the FWER. However, when comparing the proposed method to a simulation-based correction, the results are similar except for the higher levels of ρ with a CS dependency structure. In this case, the simulation-based or a marginal approach (Holm, Hochberg or Sidak) have more power.

    This application uses data from a genome-wide association study with the objective of identifying exonic variants that play a role in a specific form of leukemia. The mode of analysis involves Fisher exact tests on 1061 variants with 30 total subjects, 20 of which served as controls and 10 with leukemia. In order to apply an appropriate multiplicity control method, a salient dependence structure must be well-defined. With this aim in mind, we consider using two possible dependency structures in order to apply the proposed multiplicity correction: Case 1) the standard covariance matrix estimated using a subset of the full data that focuses on those occurring on a selected list of cancer-related gene mutations, and Case 2) a dimension-reduced version derived from a hierarchical agglomerative clustering algorithm. We demonstrate both approaches in order to show how the proposed procedure could be combined with a dimension-reduction algorithm to further increase the power of the multiplicity correction. First, the covariance matrix for the 93 kinome variants is estimated. Second, the gene mutations were grouped using a hierarchical agglomerative algorithm as described in [27]. This algorithm identified 8 clusters using the linkage disequilibrium structure of the variants that are related to this form of leukemia. Following this clustering of the variants, we performed multiple testing on the aggregated predictors to test associations. Then, the covariance matrix was estimated based on this dimension-reduced set and utilized with the proposed algorithm. When using Case 1 variants, the adjusted critical value was c = 11.21 and, when using the reduced set of 8 gene variants clusters from Case 2, the critical value reduced to c= 6.51. Using the Case 1 critical point, 13 of the 92 variants were significant and using the Case 2 critical point, there are 13 significant variants. Note that both of these critical points are more powerful alternatives than a standard Bonferroni adjusted critical point which, in this case, would be 11.98. The computational time for obtaining the adjusted critical values was less than 3 seconds and clearly would not provide a burden to the researcher, even if there were multiple subsets of variants to assess independently.

    We note that use of this proposed methodology extends well beyond the presented genomics application. There are multiple contexts that warrant use of a multiplicity adjustment for model or non-model based chi-square test endpoints being simultaneously evaluated. Just of a few of these include model re-specification tests for structural equation modeling, general model comparison tests, or any study involving categorical variables that might be jointly assessed in a log linear association model. These analytic approaches can occur across a range of disciplinary contexts and might involve smaller or large scale inference sets. The appeal of this method is that it can be used in most any context and with little computational load.

    This manuscript presents a multiplicity correction suitable for many modes of analysis that involves chi-square distributed endpoints. This method is more efficient than the Bonferroni critical point, often is as effective as a simulated critical point, and is easy to employ. However, for a compound symmetric structure with higher correlation, a different marginal method or simulation-based method would be a better choice. In the above section, this proposed multiplicity correction is applied to a setting where multiple association tests could be utilized. However, researchers could alternatively present score tests resulting from an estimated generalized linear model in this setting, such as Rao's score test ([26]; [29]), Pearson's test ([28]; [30]). The proposed method is applicable to either mode of analysis. Clearly this multiplicity correction is suitable in a wide variety of settings involving chi-square distributed endpoints, including multiple association tests, model fit tests, or score-based tests in generalized linear model (GLM) settings. The multiplicity adjustment may also be utilized to obtain both multiplicity corrections for test results or to construct simultaneous confidence intervals.

    With regards to the proposed critical point itself, it is known that existing methods are either quite conservative (a.k.a. Bonferroni), difficult to employ, or computationally intensive (a.k.a., simulation or resampling-based methods). The proposed methodology is easy for practitioners to use, more powerful than the Bonferroni adjustment, and is also easily accessible via an R function, available upon request of the first author. The simulations performed indicate that there are many settings where the proposed Hunter-Worsley chi-square critical point is far more efficient than the Bonferroni adjusted critical point. It also performs very similarly to the simulated critical point for many settings. First, whenever there is a significant amount of correlation between the endpoints (e.g., whenever the ρ exceeds 0.3), the Hunter-Worsley adjusted critical point is more efficient than the Bonferroni critical point, regardless of correlation structure, number of comparisons, or degrees of freedom. Also, even when there is no correlation between the endpoints, the proposed critical point effectively controls the FWER. In contrast, a compound symmetric correlation structure with ρ less than 0.7 or an AR(1) correlation structure with ρ less than 0.9 results in negligible empirical error rate differences when comparing the simulated versus Hunter-Worsley approaches. Moreover, since the Hunter-Worsley critical point is faster and easier to implement, we suggest utilizing it in these settings. It is worth noting that for non-statisticians, the simulation-based critical points are more difficult to employ because they require more background knowledge and considerable computing power, particularly in high-dimensional settings. In contrast, the Hunter-Worsley procedure is available to end-users via R code and only requires an exact or estimated correlation matrix for the data, significance level, and degrees of freedom. Consequently, the proposed critical point is readily accessible, and may be applied to any general setting where a multiplicity adjustment is warranted and chi-square distributed endpoints are utilized. The simulations demonstrate that the Hunter-Worsley adjusted critical point is at least as effective as, and often considerably more effective than, other easily accessible multiplicity correction methods except for highly dependent compound symmetric data where this procedure lacks power. Table 1 provides a summary of recommendations for practitioners.

    Table 1.  Practical Recommendations for FWER Control.
    Low level of dependence High level of dependence
    m<10 Marginal or joint methods: Joint methods:
    Bonferroni or similar Resampling/Simulation or proposed
    10m100 Joint methods: Joint methods:
    Resampling or proposed Resampling/Simulation or proposed
    m>100 Joint or partially-joint methods: Joint methods:
    Proposed

     | Show Table
    DownLoad: CSV

    The proposed procedure is efficient and controls the type I error rate with more power than a Bonferroni correction. The method is particularly well-suited for, and flexible to, many dependence structures of the test statistics. Additionally the type I error control of the method provides increased confidence that research results will be reproducible.

    This work was supported by Grant 5U54MD007592 from the National Institutes on Minority Health and Health Disparities (NIMHD), a component of the National Institutes of Health (NIH).

    The authors have no conflict of interest.



    [1] Q. Jia, D. Zhang, S. Yang, C. Xia, Y. Shi, H. Tao, et al., Traditional Chinese medicine symptom normalization approach leveraging hierarchical semantic information and text matching with attention mechanism, J. Biomed. Inf., 116 (2021), 103718. https://doi.org/10.1016/j.jbi.2021.103718 doi: 10.1016/j.jbi.2021.103718
    [2] Z. Huang, J. Miao, J. Chen, Y. Zhong, S. Yang, Y. Ma, et al., A traditional Chinese medicine syndrome classification model based on cross-feature generation by convolution neural network: model development and validation, JMIR Med. Inf., 10 (2022), e29290. https://doi.org/10.2196/29290 doi: 10.2196/29290
    [3] T. Bai, H. Guan, S. Wang, Y. Wang, L. Huang, Traditional Chinese medicine entity relation extraction based on CNN with segment attention, Neural Comput. Appl., 34 (2022), 2739–2748. https://doi.org/10.1007/s00521-021-05897-9 doi: 10.1007/s00521-021-05897-9
    [4] A. Roy, S. Pan, Incorporating medical knowledge in BERT for clinical relation extraction, in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, (2021), 5357–5366. https://aclanthology.org/2021.emnlp-main.435
    [5] P. Chandak, K. Huang, M. Zitnik, Building a knowledge graph to enable precision medicine, Sci. Data, 10 (2023), 67. https://doi.org/10.1038/s41597-023-01960-3 doi: 10.1038/s41597-023-01960-3
    [6] G. Zhou, E. Haihong, Z. Kuang, L. Tan, X. Xie, J. Li, et al., Clinical decision support system for hypertension medication based on knowledge graph, Comput. Methods Programs Biomed., 227 (2022). https://doi.org/10.1016/j.cmpb.2022.107220 doi: 10.1016/j.cmpb.2022.107220
    [7] D. Zhang, Q. Jia, S. Yang, X. Han, C. Xu, X. Liu, et al., Traditional Chinese medicine automated diagnosis based on knowledge graph reasoning, Comput. Mater. Contin., 71 (2022). https://doi.org/10.32604/cmc.2022.017295 doi: 10.32604/cmc.2022.017295
    [8] Y. An, X. Xia, X. Chen, F. X. Wu, J. Wang, Chinese clinical named entity recognition via multi-head self-attention based Bi-LSTM-CRF, Artif. Intell. Med., 127 (2022), 102282. https://doi.org/10.1016/j.artmed.2022.102282 doi: 10.1016/j.artmed.2022.102282
    [9] Y. Ma, Y. Liu, D. Zhang, J. Zhang, H. Liu, Y. Xie, A multigranularity text driven named entity recognition CGAN model for traditional Chinese medicine literatures, Comput. Intell. Neurosci., 2022 (2022), 1495841.
    [10] R. Qi, P. Lv, Q. Zhang, M. Wu, Research on Chinese medical entity recognition based on multi-neural network fusion and improved tri-training algorithm, Appl. Sci., 12 (2022), 8539. https://doi.org/10.3390/app12178539 doi: 10.3390/app12178539
    [11] Y. Li, X. Wang, L. Hui, L. Zou, H. Li, L. Xu, et al., Chinese clinical named entity recognition in electronic medical records: Development of a lattice long short-term memory model with contextualized character representations, JMIR Med. Inf., 8 (2020), e19848.
    [12] M. Zhang, J. Wang, X. Zhang, Using a pre-trained language model for medical named entity extraction in Chinese clinic text, in 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC), IEEE, (2020), 312–317. https://doi.org/10.1109/ICEIEC49280.2020.9152257
    [13] R. Xie, Y. Wang, D. Peng, X. Liu, X. Su, X. Li, Research on standardization of traditional Chinese medicine symptoms, Henan Tradit. Chin. Med., 7 (2017), 11441146. https://doi.org/10.16367/j.issn.1003-5028.2017.07.0403 doi: 10.16367/j.issn.1003-5028.2017.07.0403
    [14] K. Zhou, J. Dong, S. Wang, G. Li, Y. Zheng, T. Wang, A review of research ideas and methods for standardization of traditional Chinese medicine symptoms in the past 20 years, Glob. Tradit. Chin. Med., 4 (2022), 708–712.
    [15] Y. J. Hui, Q. L. Zha, A review of traditional Chinese medicine symptom information extraction, Comput. Eng. Appl., 59 (2023), 35–47.
    [16] L. Ma, J. Li, The significance and methodology of standardization of symptom nomenclature, Liaoning J. Tradit. Chin. Med., 37 (2010), 1264–1265.
    [17] W. Liu, F. Zhu, Reflections on several issues in the standardization of traditional Chinese medicine symptoms, J. Tradit. Chin. Med., 48 (2007), 555–556.
    [18] D. Yan, M. Cui, Exploration of 'symptoms and ssigns' classification in the traditional Chinese medicine clinical terminology system, Chin. J. Med. Libr. Inf. Sci., 10 (2015), 77–80.
    [19] Y. Dong, M. Cui, Discussion on the classification of 'symptoms and signs' in the clinical terminology system of traditional Chinese medicine, Chin. J. Med. Libr. Inf. Sci., 24 (2015), 77–80.
    [20] X. Ling, D. Weld, Fine-grained entity recognition, in Proceedings of the AAAI Conference on Artificial Intelligence, 26 (2012), 94–100. https://doi.org/10.1609/aaai.v26i1.8122
    [21] K. Pu, H. Liu, Y. Yang, W. Lv, J. Li, Multi-label fine-grained entity typing for baidu Wikipedia based on pre-trained model, in China Conference on Knowledge Graph and Semantic Computing, 1553 (2021), 114–123. https://doi.org/10.1007/978-981-19-0713-5_13
    [22] X. Ren, W. He, M. Qu, Label noise reduction in entity typing by heterogeneous partial-label embedding, in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, (2016), 1825–1834. https://doi.org/10.1177/0165551521998048
    [23] Y. Onoe, G. Durrett, Learning to denoise distantly-labeled data for entity typing, preprint, arXiv: 1905.01566. https://doi.org/10.48550/arXiv.1905.01566
    [24] H. Zhang, D. Long, G. Xu, M. Zhu, P. Xie, F. Huang, et al., Learning with noise: improving distantly-supervised fine-grained entity typing via automatic relabeling, in Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, (2021), 3808–3815. https://doi.org/10.24963/ijcai.2020/527
    [25] M. A. Ali, Y. Sun, B. Li, W. Wang, Fine-grained named entity typing over distantly supervised data based on refined representations, in Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 7391–7398. https://doi.org/10.1609/aaai.v34i05.6234
    [26] Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, M. Varma, Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising, in Proceedings of the 2018 World Wide Web Conference, (2018), 993–1002. https://doi.org/10.1145/3178876.3185998
    [27] R. You, Z. Zhang, Z. Wang, S. Dai, H. Mamitsuka, S. Zhu, Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification, Adv. Neural Inf. Process. Syst., 32 (2019). https://arXiv.org/abs/1811.01727
    [28] J. Zhang, W. C. Chang, H. F. Yu, I. Dhillon, Fast multi-resolution transformer fine-tuning for extreme multi-label text classification, Adv. Neural Inf. Process. Syst., 34 (2021), 7267–7280. https://doi.org/10.48550/arXiv.2110.00685 doi: 10.48550/arXiv.2110.00685
    [29] X. Xiao, Research on Data Elements of Traditional Chinese Medicine Clinical Symptoms Based on Machine Learning, Ph.D. thesis, Hunan University of Traditional Chinese Medicine, 2018.
    [30] N. Zhang, X. Cao, R. Lin, B. Wang, H. Shi, H. Zhou, et al., Research on the normalization of traditional Chinese medicine symptom terms in epilepsy, World Sci. Technol.-Modernization Tradit. Chin. Med., 22 (2020), https://doi.org/10.11842/wst.20190415001 doi: 10.11842/wst.20190415001
    [31] M. Li, Q. Zhou, X. Luo, B. Zhu, Research on the standard and classification system of traditional Chinese medicine symptom terminology, Chin. J. Tradit. Chin. Med. Pharm., 36 (2021), 4838–4842.
    [32] E. F. Sang, F. D. Meulder, Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition, preprint, arXiv: cs/0306050. https://doi.org/10.48550/arXiv.cs/0306050
    [33] D. Gillick, N. Lazic, K. Ganchev, J. Kirchner, D. Huynh, Context-dependent fine-grained entity type tagging, preprint, arXiv: 1412.1820. https://doi.org/10.48550/arXiv.1412.1820
    [34] E. Choi, O. Levy, Y. Choi, L. Zettlemoyer, Ultra-fine entity typing, preprint, arXiv: 1807.04905. https://doi.org/10.48550/arXiv.1807.04905
    [35] A. Abhishek, A. Anand, A. Awekar, Fine-grained entity type classification by jointly learning representations and label embeddings, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, (2017), 797–807. https://aclanthology.org/E17-1075
    [36] F. López, M. Strube, A fully hyperbolic neural model for hierarchical multi-class classification, preprint, arXiv: 2010.02053. https://doi.org/10.48550/arXiv.2010.02053
    [37] W. Xiong, J. Wu, D. Lei, M. Yu, S. Chang, X. Guo, et al., Imposing label-relational inductive bias for extremely fine-grained entity typing, preprint, arXiv: 1903.02591. https://doi.org/10.48550/arXiv.1903.02591
    [38] Y. Fan, J. Liu, J. Tang, P. Liu, Y. Du, Learning correlation information for multi-label feature selection, Pattern Recognit., 2023. https://doi.org/10.1016/j.patcog.2023.109899 doi: 10.1016/j.patcog.2023.109899
    [39] Y. Fan, J. Liu, P. Liu, Y. Du, W. Lan, S. Wu, Manifold learning with structured subspace for multi-label feature selection, Pattern Recognit., 120 (2021). https://doi.org/10.1016/j.patcog.2021.108169 doi: 10.1016/j.patcog.2021.108169
    [40] Y. Fan, B. Chen, W. Huang, J. Liu, W. Weng, W. Lan, Multi-label feature selection based on label correlations and feature redundancy, Knowl.-Based Syst., 241 (2022). https://doi.org/10.1016/j.knosys.2022.108256 doi: 10.1016/j.knosys.2022.108256
    [41] Y. Fan, J. Liu, S. Wu, Exploring instance correlations with local discriminant model for multi-label feature selection, Appl. Intell., 52 (2022), 1–19. https://doi.org/10.1007/s10489-021-02799-0 doi: 10.1007/s10489-021-02799-0
    [42] S. Kharbanda, A. Banerjee, E. Schultheis, R. Babbar, CascadeXML: Rethinking transformers for end-to-end multi-resolution training in extreme multi-label classification, in Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022), 2022.
    [43] J. Sun, F. Yang, W. Deng, Construction of a knowledge representation model for traditional Chinese medicine symptoms based on ontology, J. Med. Inf., 38 (2017).
    [44] T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, preprint, arXiv: 1609.02907. https://doi.org/10.48550/arXiv.1609.02907
    [45] J. Yu, B. Bohnet, M. Poesio, Named entity recognition as dependency parsing, preprint, arXiv: 2005.07150. https://doi.org/10.48550/arXiv.2005.07150
    [46] R. Yang, Q. Ye, C. Cheng, S. Zhang, Y. Lan, J. Zou, Decision-making system for the diagnosis of syndrome based on traditional Chinese medicine knowledge graph, Evid. Based Complementary Altern. Med., 2022. https://doi.org/10.1155/2022/8693937 doi: 10.1155/2022/8693937
    [47] N. Deng, H. Fu, X. Chen, Named entity recognition of traditional Chinese medicine patents based on BiLSTM-CRF, Wireless Commun. Mob. Comput, 2021. https://doi.org/10.1155/2021/6696205 doi: 10.1155/2021/6696205
    [48] Q. Qu, H. Kan, Y. Wu, Y. Gao, Named entity recognition of TCM text based on bert model, in 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), 2020. https://doi.org/10.1109/IFEEA51475.2020.00139
    [49] P. Yang, X. Sun, W. Li, S. Ma, W. Wu, H. Wang, SGM: Sequence generation model for multi-label classification, preprint, arXiv: 1806.04822. https://doi.org/10.48550/arXiv.1806.04822
  • This article has been cited by:

    1. K. A Churiukina, O. N Matchuk, A. D Kaprin, S. A Ivanov, V. S Koval, A. F Arutyunyan, A. L Zhuze, I. A Zamulaeva, Effects of DNA-binding ligands from dimeric bisbenzimidazoles series DBA(n) and DBPA( n) in combination with γ-radiation on epithelial-mesenchymal transition and pool size of MCF-7 breast cancer stem cells, 2023, 68, 0006-3029, 529, 10.31857/S0006302923030158
    2. K. A. Churiukina, O. N. Matchuk, A. D. Kaprin, S. A. Ivanov, V. S. Koval, A. F. Arutyunyan, A. L. Zhuze, I. A. Zamulaeva, The Effect of DNA-Binding Ligands from Dimeric Bisbenzimidazoles of the DBA(n) and DBPA(n) Series in Combination with γ-Radiation on Epithelial–Mesenchymal Transition and Pool Size of MCF-7 Breast Cancer Stem Cells, 2023, 68, 0006-3509, 422, 10.1134/S0006350923030065
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1990) PDF downloads(121) Cited by(3)

Figures and Tables

Figures(8)  /  Tables(5)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog