Processing math: 100%
Research article Special Issues

Saddlepoint approximation of the p-values for the multivariate one-sample sign and signed-rank tests

  • A multivariate data analysis (MVDA) is a powerful statistical approach to simultaneously analyze datasets with multiple variables. Unlike univariate or bivariate analyses, which simultaneously focus on one or two variables, respectively, MVDA considers the interactions and relationships among multiple variables within a dataset. Several nonparametric tests can be used in the context of one-sample multivariate location problems. The exact distributions of such tests cannot be analytically computed and are usually approximated using an asymptotic approximation. This article proposes the saddlepoint approximation method to approximate the tail probability for multivariate sign and signed-rank tests. It is suggested as a more accurate alternative to the traditional asymptotic approximation method and an alternative to the simulation method. It requires a lot of time as it depends on all possible permutations. Real data examples were provided to illustrate the calculation of p-values, and a simulation study was conducted to compare the accuracy of the saddlepoint approximation method with the simulation method (permutation-based, so time-consuming) and an asymptotic normal approximation method. The study results show that the saddlepoint approximation provides highly accurate approximations to the p-values of the considered statistics, and it often outperforms the normal approximation. Additionally, the results show that the proposed method's computation time is much less than that of the time-consuming simulation method.

    Citation: Abd El-Raheem M. Abd El-Raheem, Ibrahim A. A. Shanan, Mona Hosny. Saddlepoint approximation of the p-values for the multivariate one-sample sign and signed-rank tests[J]. AIMS Mathematics, 2024, 9(9): 25482-25493. doi: 10.3934/math.20241244

    Related Papers:

    [1] Abd El-Raheem M. Abd El-Raheem, Mona Hosny . Saddlepoint approximation for the p-values of some distribution-free tests. AIMS Mathematics, 2025, 10(2): 2602-2618. doi: 10.3934/math.2025121
    [2] Abd El-Raheem M. Abd El-Raheem, Mona Hosny . Saddlepoint p-values for a class of nonparametric tests for the current status and panel count data under generalized permuted block design. AIMS Mathematics, 2023, 8(8): 18866-18880. doi: 10.3934/math.2023960
    [3] C. T. J. Dodson . Information distance estimation between mixtures of multivariate Gaussians. AIMS Mathematics, 2018, 3(4): 439-447. doi: 10.3934/Math.2018.4.439
    [4] Dongmei Cui, Michael B. C. Khoo, Huay Woon You, Sajal Saha, Zhi Lin Chong . A proposed non-parametric triple generally weighted moving average sign chart. AIMS Mathematics, 2025, 10(3): 5928-5959. doi: 10.3934/math.2025271
    [5] Baishuai Zuo, Chuancun Yin . Stein’s lemma for truncated generalized skew-elliptical random vectors. AIMS Mathematics, 2020, 5(4): 3423-3433. doi: 10.3934/math.2020221
    [6] M. E. Abdel-Aal, S. A. Bashammakh . A study on the varieties of equivalent cordial labeling graphs. AIMS Mathematics, 2024, 9(12): 34720-34733. doi: 10.3934/math.20241653
    [7] N. Pazhaniraja, Shakila Basheer, Kalaipriyan Thirugnanasambandam, Rajakumar Ramalingam, Mamoon Rashid, J. Kalaivani . Multi-objective Boolean grey wolf optimization based decomposition algorithm for high-frequency and high-utility itemset mining. AIMS Mathematics, 2023, 8(8): 18111-18140. doi: 10.3934/math.2023920
    [8] Enrique de Amo, José Juan Quesada-Molina, Manuel Úbeda-Flores . Total positivity and dependence of order statistics. AIMS Mathematics, 2023, 8(12): 30717-30730. doi: 10.3934/math.20231570
    [9] Dijian Wang, Dongdong Gao . Laplacian integral signed graphs with few cycles. AIMS Mathematics, 2023, 8(3): 7021-7031. doi: 10.3934/math.2023354
    [10] Chenlu Zheng, Jianping Zhu . Promote sign consistency in cure rate model with Weibull lifetime. AIMS Mathematics, 2022, 7(2): 3186-3202. doi: 10.3934/math.2022176
  • A multivariate data analysis (MVDA) is a powerful statistical approach to simultaneously analyze datasets with multiple variables. Unlike univariate or bivariate analyses, which simultaneously focus on one or two variables, respectively, MVDA considers the interactions and relationships among multiple variables within a dataset. Several nonparametric tests can be used in the context of one-sample multivariate location problems. The exact distributions of such tests cannot be analytically computed and are usually approximated using an asymptotic approximation. This article proposes the saddlepoint approximation method to approximate the tail probability for multivariate sign and signed-rank tests. It is suggested as a more accurate alternative to the traditional asymptotic approximation method and an alternative to the simulation method. It requires a lot of time as it depends on all possible permutations. Real data examples were provided to illustrate the calculation of p-values, and a simulation study was conducted to compare the accuracy of the saddlepoint approximation method with the simulation method (permutation-based, so time-consuming) and an asymptotic normal approximation method. The study results show that the saddlepoint approximation provides highly accurate approximations to the p-values of the considered statistics, and it often outperforms the normal approximation. Additionally, the results show that the proposed method's computation time is much less than that of the time-consuming simulation method.



    Many real-world phenomena involve interactions between multiple factors. A multivariate data analysis (MVDA) allows researchers to dissect these complex systems and to understand how different variables affect each other and contribute to the overall outcomes. For example, in medical research, datasets often include multiple variables such as patient demographics, genetic information, medical history, biomarkers, and treatment outcomes. MVDA helps researchers identify factors that influence the disease risk, treatment response, and patient outcomes. This is just an example for illustration; however, MVDA is relevant to many other fields, including engineering, economics, agriculture, public health, psychology, urban planning, energy, and more. Essentially, any discipline that deals with complex systems or phenomena that involve multiple interacting factors relies on multivariate analyses to extract meaningful insights from data. Accordingly, many statisticians were interested in extending univariate and bivariate statistical tests to multivariate cases. Among these tests are the sign tests. Sign-based approaches are non-parametric methods that are very appealing because of their inherent simplicity and resistance to standard Gaussian assumptions. Sign tests originated in the univariate case when they were primarily used to assess issues with location and symmetry. Over several decades, multivariate extensions of univariate sign-based approaches have garnered significant interest. Bivariate sign test location testing may be traced back to Hodges [1] and Blumen [2]. Numerous sign test and signed-rank approaches for the multivariate location issue have recently been developed in scientific literature. Randles [3] suggested an inter-directional, distribution-free multivariate sign test. Hettmansperger et al. [4] presented a new approach to conduct hypothesis tests on the central location of multivariate data (MVD), and emphasized the importance of asymptotic invariance for robust statistical inference. Möttönen and Oja [5] introduced the multivariate spatial sign method to robustly estimate location parameters in an MVDA. Hettmansperger et al. [6] presented a novel approach to construct affine-invariant multivariate one-sample sign tests. For more information and knowledge about non-parametric tests for MVD, the reader can review the following references: Möttönen and Oja [7], Larocque and Labarre [8], Mahfoud and Randles [9], and Bernard and Verdebout [10]. Moreover, we refer to Oja's article, Oja [11], which reviewed the literature on multivariate sign-rank tests, and focused on their properties, applications, and limitations.

    Approximation techniques in statistics approximate intricate statistical measures, distributions, or functions when precise computations become challenging or unfeasible. Such methods prove invaluable when analytical solutions are absent or when computations entail high-dimensional or computationally demanding tasks. This article suggests the saddlepoint approximation (SPA) to approximate the exact distribution function for the multivariate sign and signed-rank test class. The SPA is a method employed in statistics and the probability theory to approximate probability distributions, especially in situations where traditional methods such as numerical integration or exact calculations are challenging. SPA provides highly accurate approximations to the distribution of a statistic, and often outperforms traditional methods such as the normal approximation, especially in the distribution's tails. It includes terms from the Edgeworth expansion, which provides a more refined approximation compared to the central limit theorem. It is computationally feasible with modern computing resources, which allows practitioners to implement it practically. It improves the accuracy of statistical inference, particularly for hypothesis testing and confidence interval estimation, thus leading to more reliable conclusions. Unlike many other asymptotic methods, the SPA is often effective even with relatively small sample sizes, thus making it useful in practical situations where data may be limited. The method has been implemented in various statistical software packages, making it accessible to practitioners without requiring deep theoretical knowledge of the underlying mathematics. Some essential references on SPA include Daniels [12], Lugannani and Rice [13], Skovgaard [14], Barndorff-Nielsen and Cox [15], and Butler [16]. The SPA has applications in various statistics and related fields, such as biostatistics and epidemiology, reliability engineering, finance and risk management, genetics and genomics, machine learning and data science, statistical physics, and thermodynamics. The following are some recent references on SPA: Meng et al. [17], Abd El-Raheem and Abd-Elfattah [18,19], Zhao et al. [20], Shanan et al. [21], Meng et al. [22], Abd El-Raheem and Hosny [23], and Abd El-Raheem et al. [24,25]. These references cover various aspects and applications of SPA, including tail probabilities, high-dimensional data, genetics, and nonlinear functionals. They provide insights into recent developments and advancements in the field, thus making them valuable resources for researchers interested in SPA and related topics.

    The existing studies of asymptotic approximations for the p-value of the one-sample multivariate sign and signed rank tests provide valuable and accessible tools for non-parametric inferences in multivariate settings. While these approaches are powerful, they have some limitations, the most important of which is their reliance on large sample sizes. The asymptotic normal method assumes that the sample size is sufficiently large for the central limit theorem to hold. For small or moderate sample sizes, the normal approximation may not be accurate, thus leading to biased or incorrect inferences. Furthermore, for small sample sizes, the distribution of the sample mean or other statistics may significantly deviate from normality, making the approximation unreliable. Thus, the accuracy of asymptotic approximations in small samples is a significant concern. Accordingly, in this article, we propose the SPA method to approximate the p-value of the one-sample multivariate sign and signed rank tests as a more accurate alternative to the normal approximation, especially with medium and small sample sizes, and as a less computationally demanding and time-consuming alternative to the permutation method.

    The remaining sections of this article are organized as follows: Section 2 presents multivariate sign and signed-rank tests with minimal adjustments to account for the linear case; the proposed approximation is described in Section 3; Sections 4 and 5 compare the performance of the saddlepoint technique versus the asymptotic method using numerical examples and simulation studies; and finally, the conclusion is provided in Section 6.

    This section presents the most common sign and signed-rank test statistics for multivariate samples. The first statistic of the multivariate sign test was developed by Hettmansperger et al. [4] as a general multivariate analog of the bivariate sign test. Suppose that X1,X2,...,Xn is a random sample from the multivariate symmetric distribution F with an unknown symmetry center θ. Here, symmetry means that Xiθ and θXi are identically distributed. To examine the hypothesis, H0: θ=0, let H be a fixed half-space such that if X is a member of H, then X is not a member and let Xi=aiYi,i=1,,n, where YiH, ai indicates whether Xi belongs to H or not.

    First, we start by presenting the statistic in the trivariate case; then, we present its generalization to the multivariate case. Thus, the trivariate sign test statistic is given by the following:

    Ms=ni=1aiqi,

    where

    qi=(qi1qi2qi3)=(n12)1j<lSijl(0)(|yj2yl2yj3yl3||yj1yl1yj3yl3||yj1yl1yj2yl2|),j,l=1,2,,n,

    such that

    Sijl(0)=sgn(|110yi111yj1yl10yi20yi3yj2yl2yj3yl3|),

    indicates whether 0 is above or below the plane defined by the three points Yi, Yj and Yl.

    From the above, we can present the generalization of the statistic Ms to the case of the k-variate as follows:

    Ms=ni=1aiqi, (1)

    where

    qi=(qi1qi2qik)=(n1k1)11i1<<ik1nSi,i1ik1(0)(W1(i1,,ik1)W2(i1,,ik1)Wk(i1,,ik1)),

    and Wj(i1,,ik1),j=1,2,...,k is the cofactor of yij for the matrix

    (yi1yi11yi2yi12yik11yik12yikyi1kyik1k),

    and

    Si,i1ik1(0)=sgn(|110yi111yik10yi1kyik1|),

    which indicates whether 0 is above or below the hyperplane defined by the points Yi1,,Yik.

    Under the null hypothesis, which is dependent upon the observed values Y1,Y2,...,Yn, the ai are independent and P(ai=1)=P(ai=1)=1/2, which means that E(Ms|H0)=0 and σ2=E(MsMsT|H0)=ni=1qiqiT.

    The second statistic is a multivariate signed-rank test statistic, which was introduced by Hettmansperger et al. [7]. Let X1,X2,...,Xn be a random sample from a k-variate continuous distribution and

    P={p=(i1,i2,,ik):1i1<i2<<ikn},

    be the set of NP=(nk) different k-tuples of the index set {1,2,,n}. The index p belongs to the set P, which refers to a k-subset of the initial observations. Furthermore, using the k observations provided in p as vertices, p determines a (k1)-dimensional hyperplane (passing through the k observations included in p) in the k-dimensional space and a (k1)-dimensional sub-simplex.

    Based on the symbols and definitions contained in the previous statistic (1), let E be the set of 2k possible vectors (±1,±1,,±1) and define the following:

    Q+p(X)=2keϵESpe(X)dpe,

    where Spe(X)=sgn(d0pe+XTdpe), such that d0pe=(1)kdet(e1Xi1,e2Xi2,,ekXik) and dpe is the k-dimensional vector of cofactors of X in the following:

    det(11e1Xi1e2Xi211ekXikX).

    If Spe(X)>or(<)0,then this means the hyperplane p is above or below X, respectively.

    The formula for the multivariate signed-rank test statistic is given by the following:

    MR=ni=1aiR(Yi), (2)

    In the statistic (2), Xi=aiYi, where YiH. Hence, ai=±1 as XiH or XiHc, and R is defined as the vector signed-rank function as follows:

    R(Yi)=(ri1ri2rik)=N1PpϵPQ+p(Yi).

    Under the null hypothesis, which is dependent upon the observed values Y1,Y2,...,Yn, the ai are independent and P(ai=1)=P(ai=1)=1/2. Hence, E(MR|H0)=0 and covariance matrix is as follows:

    σ2=cov(n1/2MR)=n1ni=1R(Yi)R(Yi)T.

    Because both the multivariate sign test statistic Ms in Eq (1) and the multivariate signed-rank test statistic MR in Eq (2) are basically multivariate normal distributions, we can modify them and obtain an equivalent statistic of a linear nature form as follows:

    D=ni=1aiTi, (3)

    where Ti=kj=1qij forthe sign test statistic Ms in Eq (1) and Ti=kj=1rij for the signed-rank test statistic MR in Eq (2).

    The following section derives the SPA for the permutation distribution of the class D in Eq (3).

    Let bi=ai+12; then, the statistic D in Eq (3) can be written as follows:

    D=ni=12biTini=1Ti, (4)

    where bi=0or1fori=1,2,,n are independent identically Bernoulli (1/2) random variates. Thus, the moment generating function of the statistic D in Eq (4) is given by

    MD(s)=exp(sni=1Ti)ki=1{12+12exp(2Tis)},

    and the cumulant generating function of the statistic D in Eq (4) is given by

    CD(s)=sni=1Ti+ni=1log{12+12exp(2Tis)}.

    The SPA for the distribution function [13], FD(.), and the probability mass function [12], fD(.), at D=D0 are given, respectively, by the following:

    ˆFD(D0){Φ(˜w)+ϕ(˜w)(1˜w1˜u)ifD0μ12+CD(0)62πCD(0)3/2ifD0=μ (5)

    and

    ˆfD(D0)12πCD(˜s)exp[CD(˜s)˜sD0],

    where

    ˜w=sgn(˜s)2[˜sD0CD(˜s)]and˜u=˜sCD(˜s),

    are functions of D0 where CD and CD are the second and third derivatives of CD, respectively. The two symbols, Φ and φ, denote the standard normal distribution and density functions, respectively, and the symbol sgn(˜s) denotes the sign of ˜s. The saddlepoint ˜s is the unique solution of the equation CD(˜s)=D0, that is,

    CD(˜s)=ni=12Tiexp(2Tis)1+exp(2Tis)ni=1Ti=D0. (6)

    The Newton-Raphson method is used to solve the saddlepoint equation in Eq (6). The calculation of the saddlepoint p-value is summed up as follows: start by solving the saddlepoint Eq (6) to find ˜s at given D0; then use ˜s to find ˜wand ˜u; and finally, substitute with ˜w and ˜u in Eq (5) to find ˆFD(D0). Thus, the saddlepoint p-value at D0 is given by ˜P(DD0)=1ˆFD(D0).

    It is necessary to point out that the test statistic in (4) includes discrete random variables, even though the continuous SPA was used to approximate its distribution function. The reason for using the continuous formula is that it offers the most precise approximation for the mid-p-value. This accuracy was discussed by Pierce and Peters [26], Davison and Wang [27], and discussed in Section 6.1.4 in Butler [16]. The simplest explanation from the last reference suggests that a continuous SPA serves as an approximation to the true inverse Fourier transform, which determines P(DD0). Given that P(DD0) has a step discontinuity at D0, the exact Fourier inversion at D=D0 is the midpoint of the step or the mid-p-value, which is what the continuous SPA actually approximates; see Theorem 10.7b in the reference Henrici [28].

    Analyzing some numerical examples can deepen our grasp of how various methods accurately approximate the exact p-value of the multivariate sign and signed-rank tests. Therefore, this section includes the analysis of four distinct types of multivariate real data sets (i.e., four numerical examples). Rao [29] conducted a study that involved cork bores from the north (N), east (E), south (S), and west (W) directions of tree trunks in a plantation block comprised of 28 trees. The aim was to assess whether there were variations in the bark deposit thickness and weight across the four cardinal directions. The tested hypothesis was whether cork bores on the trees exhibited a uniform thickness in all directions. In the first example, 14 observations were selected from the original dataset, which were utilized by Hettmansperger et al. [4] in their article. In the second example, all 28 observations were included. For both examples, the data were transformed into trivariate observations using x1=EN,x2=SEandx3=WS; see Hettmansperger et al. [4] for more information about these transformations. The third example analyzes multivariate lung function data from Merchant et al. [30] for 12 workers that were exposed to cotton dust for six hours. The data includes numerous variables discussed in the attempt to determine lung function changes, including forced vital capacity, forced expiratory volume, and closing capacity. These data were analyzed to test the hypothesis that the mean vector of lung function changes is equal to 0. The fourth example is 4D data that represents the monthly minimum grass temperature (℃) recorded in 2022 for the Cavan, Donegal, Carlow, and Galway counties in Ireland (Met Éireann [31]). This data is presented in Table 1. The tested hypothesis was whether the monthly minimum grass temperatures were uniform in the four counties. The approximated p-values using the simulation, SPA, and normal approximation (NA) methods are displayed in Table 2.

    Table 1.  The monthly minimum grass temperature (℃) recorded in 2022 for counties Cavan, Donegal, Carlow, and Galway in Ireland.
    County Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
    Cavan -4.1 -1.7 -8.4 -7.2 -1.1 -0.6 6.2 2.8 0.1 0.0 -1.5 -10.8
    Donegal -3.0 -1.6 -9.4 -5.9 0.4 0.4 5.1 0.8 0.7 3.5 -0.7 -8.6
    Carlow -9.2 -7.4 -8.8 -8.3 -0.8 -0.9 5.4 1.9 -1.7 -1.5 -6.1 -10.7
    Galway -8.4 -5.9 -8.7 -8.6 -0.2 0.6 5.3 2.1 -0.9 0.0 -6.3 -11.7

     | Show Table
    DownLoad: CSV
    Table 2.  The approximated p-values for the four datasets.
    Test statistic Example Simulation SPA NA
    Multivariate sign test 1 0.092921 0.092718 0.089423
    2 0.043049 0.042986 0.042904
    3 0.010043 0.010037 0.010981
    4 0.254059 0.251976 0.242107
    Multivariate signed-rank test 1 0.100608 0.100335 0.096239
    2 0.148538 0.148156 0.141879
    3 0.097989 0.097358 0.095456
    4 0.070636 0.069621 0.067370

     | Show Table
    DownLoad: CSV

    Based on the results shown in Table 2, we observe that the suggested approximation method, namely the saddlepoint method, is more accurate than the normal approximation method. This is evident from the proximity of the saddlepoint method results to those of the simulation method. It is worth noting that we cannot calculate the exact p-value due to the unknown distribution of the test statistic. However, this can be compensated for by calculating the p-value by assuming all permutations of the statistic. However, this requires a significant amount of time. This method is called the simulation or reference method. We use this method to compare the accuracy of the approximation techniques, which are the normal and saddlepoint methods. Throughout this article, we will refer to the p-value approximated using the simulation method as either a simulated p-value or a reference p-value.

    In this section, the accuracy of the SPA for the two tests, the multivariate sign test (MST) and multivariate signed-rank test (MSRT) is verified by conducting a simulation study. Four multivariate distributions are used to simulate the data: the standard multivariate normal distribution with a correlation coefficient equal to 0.5, the standard multivariate logistic distribution, the standard multivariate extreme value distribution, and the standard multivariate exponential distribution. The motivation behind selecting these distributions for data generation in the simulation study is to ensure a comprehensive evaluation of the statistical methods under a wide range of conditions (e.g., symmetry, heavy tails, outliers, and skewness). This variety of distributions helps to test the methods' robustness, versatility, and applicability to real-world data, and ultimately provides a thorough understanding of their performance. 1,000 datasets are generated from the four distributions with different sample sizes, n=10,20,30and50. Tables 36 show the results for the four distributions. The following data are included in each table: "Sad.P." refers to the percentage of the 1,000 different datasets in which the saddlepoint p-value was closer to the simulated mid p-value than it was to the asymptotic normal p-value; the term 'E.Nor.' refers to the average relative absolute error of the normal p-value from the simulated mid p-value; and the term 'E.Sad.' refers to the average relative absolute error of the saddlepoint p-value from the simulated mid p-value. The simulated mid p-value is calculated based on 106 permutations of the indicators {bi}.

    Table 3.  Sad.P., E.Sad. and E.Nor. for simulated data from the standard multivariate exponential distribution.
    Test MST MSRT
    n Sad.P. E.Sad. E.Nor. Sad.P. E.Sad. E.Nor.
    10 79.8 0.04509 0.12217 78.1 0.00340 0.00682
    20 97.0 0.14068 0.34581 95.8 0.00015 0.00158
    30 93.9 0.20208 0.28879 93.8 0.00020 0.00121
    50 96.2 0.05846 0.41571 90 0.00024 0.00094

     | Show Table
    DownLoad: CSV
    Table 4.  Sad.P., E.Sad. and E.Nor. for simulated data from the standard multivariate normal distribution.
    Test MST MSRT
    n Sad.P. E.Sad. E.Nor. Sad.P. E.Sad. E.Nor.
    10 81.1 0.01161 0.02698 79.6 0.00270 0.00642
    20 94.2 0.00138 0.01303 95.9 0.00015 0.00151
    30 93.0 0.00117 0.00745 97.2 0.00007 0.00068
    50 87.2 0.00123 0.00489 96.6 0.00003 0.00018

     | Show Table
    DownLoad: CSV
    Table 5.  Sad.P., E.Sad. and E.Nor. for simulated data from the standard multivariate extreme value distribution.
    Test MST MSRT
    n Sad.P. E.Sad. E.Nor. Sad.P. E.Sad. E.Nor.
    10 82 0.24004 0.68459 78.9 0.00276 0.00564
    20 96.1 0.01675 0.51301 97.2 0.00011 0.00122
    30 97.1 0.03988 0.65826 99.5 0.00003 0.00044
    50 93.3 0.02649 0.19689 98 0.00007 0.00057

     | Show Table
    DownLoad: CSV
    Table 6.  Sad.P., E.Sad. and E.Nor. for simulated data from the standard multivariate logistic distribution.
    Test MST MSRT
    n Sad.P. E.Sad. E.Nor. Sad.P. E.Sad. E.Nor.
    10 79.8 0.04509 0.12217 82.2 0.00167 0.00404
    20 97.0 0.01406 0.34581 97.9 0.00009 0.00064
    30 93.9 0.20208 0.28879 93.9 0.00026 0.00183
    50 96.2 0.05846 0.41571 90 0.00017 0.00072

     | Show Table
    DownLoad: CSV

    To illustrate the results obtained from the simulation study, we take Table 5 with n=30 and the MST as an illustration, and note that the saddlepoint p-values were closer to the simulated mid p-value values 97.1% of the time with a relative absolute error of 3.988% versus 65.826% for the asymptotic normal method. Across all tests and cases, the SPA proved to be highly accurate and superior to the normal approximations. This is evidenced by the high proportions listed in the Sad.P. rows in Tables 36.

    Previously, it was verified that the saddlepoint method is more accurate than the normal approximation method. Now, we must explain why the saddlepoint method is a possible alternative to the simulation method. To clarify this, we calculated the computing time for both methods, and the results are presented in Table 7. Table 7 shows a significant difference between the computing times of the saddlepoint and simulation approaches. Using the SPA approach, we can compute the average value of 1000 p-values for each of the considered cases in less than a minute. In contrast, the simulation method takes between fifty to one hundred and thirty hours to compute the corresponding values.

    Table 7.  Computation time in minutes for approximating p-values using saddlepoint and simulation methods.
    Distribution Test Time Sample size
    10 20 30 50
    Multivariate exponential MST Sad-time 0.709 0.433 0.454 0.495
    Sim- time 3126.693 5150.324 5956.334 7791.688
    MSRT Sad-time 0.416 0.442 0.524 0.512
    Sim- time 4272.836 5192.100 7015.12 6910.893
    Multivariate normal MST Sad-time 0.421 0.453 0.466 0.466
    Sim- time 4203.440 5269.037 6116.469 7727.039
    MSRT Sad-time 0.4264 0.539 0.499 0.214
    Sim- time 4285.335 6769.081 6718.269 8178.236
    Multivariate extreme value MST Sad-time 0.421 0.436 0.457 0.479
    Sim- time 4198.061 5236.192 6140.252 7693.722
    MSRT Sad-time 0.506 0.483 0.484 0.487
    Sim- time 4467.121 5614.974 6024.827 6336.412
    Multivariate logistic MST Sad-time 0.423 0.442 0.458 0.654
    Sim- time 4201.679 5165.436 6046.54 11314.340
    MSRT Sad-time 0.726 0.893 0.872 0.613
    Sim- time 7008.120 7994.677 9536.889 9326.201

     | Show Table
    DownLoad: CSV

    In conclusion, the MVDA offers a robust statistical approach to examine datasets with multiple variables, and captures the complex interactions and relationships that univariate or bivariate analyses cannot. This article highlights the challenges of computing exact distributions for nonparametric tests of one-sample multivariate location problems, which are typically addressed through an asymptotic approximation. This study introduced the saddlepoint method as a more accurate alternative to the traditional asymptotic approximation and a faster alternative to the time-intensive simulation method. The effectiveness of the saddlepoint method was demonstrated through illustrative examples and a simulation study, thus underscoring its potential as a superior approach for approximating distribution functions in multivariate analyses.

    A. M. Abd El-Raheem: Conceptualization, Methodology, Investigation, Software, Writing – review & editing; I. A. A. Shanan: Visualization, Resources, Software, Writing – original draft; M. Hosny: Funding acquisition, Project administration.

    The authors declare that they have not used Artificial Intelligence tools in the creation of this article.

    The third author extends her appreciation to the Deanship of Scientific Research and Graduate Studies at King Khalid University for funding this work through Large Research Project under grant number RGP2/398/45.

    The authors declare no conflict of interest.



    [1] J. L. Hodges, A bivariate sign test, Ann. Math. Stat., 26 (1955), 523–527.
    [2] I. Blumen, A new bivariate sign test, J. Am. Stat. Assoc., 53 (1958), 448–456. https://doi.org/10.1080/01621459.1958.10501451 doi: 10.1080/01621459.1958.10501451
    [3] R. H. Randles, A distribution-free multivariate sign test based on interdirections, J. Am. Stat. Assoc., 84 (1989), 1045–1050. https://doi.org/10.1080/01621459.1989.10478870 doi: 10.1080/01621459.1989.10478870
    [4] T. P. Hettmansperger, J. Nyblom, H. Oja, Affine invariant multivariate one-sample sign tests, J. R. Stat. Soc. B, 56 (1994), 221–234. https://doi.org/10.1111/j.2517-6161.1994.tb01973.x doi: 10.1111/j.2517-6161.1994.tb01973.x
    [5] J. Möttönen, H. Oja, Multivariate spatial sign and rank methods, J. Nonparametr. Stat., 5 (1995), 201–213. https://doi.org/10.1080/10485259508832643 doi: 10.1080/10485259508832643
    [6] T. P. Hettmansperger, J. Möttönen, H. Oja, Affine invariant multivariate one-sample signed-rank tests, J. Am. Stat. Assoc., 92 (1997), 1591–1600.
    [7] J. Möttönen, H. Oja, On the efficiency of multivariate spatial sign and rank tests, Ann. Stat., 25 (1997), 542–552.
    [8] D. Larocque, M. Labarre, A conditionally distribution-free multivariate sign test for one-sided alternatives, J. Am. Stat. Assoc., 99 (2004), 499–509.
    [9] Z. R. Mahfoud, R. H. Randles, On multivariate signed rank tests, J. Nonparametr. Stat., 17 (2005), 201–216.
    [10] G. Bernard, T. Verdebout, On some multivariate sign tests for scatter matrix eigenvalues, Econom. Stat., 29 (2024), 252–260. https://doi.org/10.1016/j.ecosta.2021.04.001. doi: 10.1016/j.ecosta.2021.04.001
    [11] H. Oja, Affine invariant multivariate sign and rank tests and corresponding estimates: A review, Scand. J. Stat., 26 (2002), 319–343. https://doi.org/10.1111/1467-9469.00152 doi: 10.1111/1467-9469.00152
    [12] H. E. Daniels, Saddlepoint approximation in statistics, Ann. Math. Stat., 25 (1954), 631–650.
    [13] R. Lugannani, S. O. Rice, Saddlepoint approximations for the distribution of the sum of independent random variables, Adv. Appl. Probab., 12 (1980), 475–490.
    [14] I. M. Skovgaard, Saddlepoint expansions for conditional distributions, J. Appl. Probab., 24 (1987), 875–887.
    [15] O. E. Barndorff-Nielsen, D. R. Cox, Asymptotic Techniques for Use in Statistics, London: Chapman & Hall, 1989.
    [16] R. W. Butler, Saddlepoint Approximations with Applications, Cambridge: Cambridge University Press, 2007.
    [17] D. Meng, S. Yang, T. Lin, J. Wang, H. Yang, Z. Lv, RBMDO using Gaussian mixture model-based second-order mean-value saddlepoint approximation, Comput. Model. Eng. Sci., 132 (2022), 553–568.
    [18] A. M. Abd El-Raheem, E. F. Abd-Elfattah, Weighted log-rank tests for clustered censored data: Saddlepoint p-values and confidence intervals, Stat. Meth. Med. Res., 29 (2020), 2629–2636.
    [19] A. M. Abd El-Raheem, E. F. Abd-Elfattah, Weighted log-rank tests for censored data under Wei's urn design: Saddlepoint approximation and confidence intervals, J. Biopharm. Stat., 2023, 1–10. https://doi.org/10.1080/10543406.2023.2183508
    [20] Q. Zhao, J. Duan, T. Wu, J. Hong, Time-dependent reliability analysis under random and interval uncertainties based on Kriging modeling and saddlepoint approximation, Comput. Ind. Eng., 182 (2023), 109391.
    [21] I. A. Shanan, E. F. Abd-Elfattah, A. M. Abd El-Raheem, A new approach for approximating the p-value of a class of bivariate sign tests, Sci. Rep., 13 (2023), 19133.
    [22] D. Meng, Y. Guo, Y. Xu, S. Yang, Y. Guo, L. Pan, et al., Saddlepoint approximation method in reliability analysis: A review, Comput. Model. Eng. Sci., 139 (2024), 2329–2359.
    [23] A. M. Abd El-Raheem, M. Hosny, Saddlepoint p-values for a class of nonparametric tests for the current status and panel count data under generalized permuted block design, AIMS Mathematics, 8 (2023), 18866–18880. https://doi.org/10.3934/math.2023960 doi: 10.3934/math.2023960
    [24] A. M. Abd El-Raheem, M. Hosny, E. F. Abd-Elfattah, Statistical inference of the class of nonparametric tests for the panel count and current status data from the perspective of the saddlepoint approximation, J. Math., 2023, 9111653. https://doi.org/10.1155/2023/9111653 doi: 10.1155/2023/9111653
    [25] A. M. Abd El-Raheem, Kh. S. Kamal, E. F. Abd-Elfattah, P-values and confidence intervals of linear rank tests for left-truncated data under truncated binomial design, J. Biopharm. Stat., 34 (2024), 127–135.
    [26] D. A. Pierce, D. Peters, Practical use of higher order asymptotics for multiparameter exponential families (with Discussion), J. R. Stat. Soc. B, 54 (1992), 701–737.
    [27] A. C. Davison, S. Wang, Saddlepoint approximations as smoothers, Biometrika, 89 (2002), 933–938.
    [28] P. Henrici, Applied and Computational Complex Analysis, Volume 2: Special Functions, Integral Transforms, Asymptotics, Continued Fractions, London: Wiley, 1977.
    [29] R. Rao, Tests of significance in multivariate analysis, Biometrika, 35 (1948), 58–79.
    [30] J. A. Merchant, G. M. Halprin, A. R. Hudson, K. H. Kilburn, W. N. McKenzie, D. J. Hurst, et al., Responses to cotton dust, Arch. Environ. Health: Int. J., 30 (1975), 222–229. https://doi.org/10.1080/00039896.1975.10666685 doi: 10.1080/00039896.1975.10666685
    [31] Met Éireann, Historical data. (n. d.), 2022. Available from: https://www.met.ie/climate/available-data/historical-data.
  • This article has been cited by:

    1. Abd El-Raheem M. Abd El-Raheem, Mona Hosny, On the Distribution of the Random Sum and Linear Combination of Independent Exponentiated Exponential Random Variables, 2025, 17, 2073-8994, 200, 10.3390/sym17020200
    2. Abd El-Raheem M. Abd El-Raheem, Mona Hosny, Saddlepoint approximation for the p-values of some distribution-free tests, 2025, 10, 2473-6988, 2602, 10.3934/math.2025121
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(763) PDF downloads(43) Cited by(2)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog