In many applications, modeling based on a normal kernel is preferred because not only does the normal kernel belong to the family of stable distributions, but also it is easy to satisfy the stationary condition in the stochastic process. However, the characteristic of the data, such as count or proportion, is a major obstacle to complete modeling based on a normal distribution. To solve a limited boundary or non-normal distribution problem, we provided a novel transformation method and proposed a nonparametric Bayesian approach based on a normal kernel of the transformed variable. In particular, the provided transformation transforms any probability space into a real space and is free from the constraints of the previous transformation, such as skewness, presence of power, and bounded domains. Another advantage was that it was possible to use the Dirichlet process mixture model with full conditional posterior distributions for all parameters, leading to a fast convergence rate in the Markov chain Monte Carlo. The proposed methodology was illustrated with simulated datasets and two real datasets with non-normal distribution problems. In addition, to demonstrate the superiority of the proposed methodology, the comparison with the transformed Bernstein polynomial model was made in the real data analysis.
Citation: Sangwan Kim, Yongku Kim, Jung-In Seo. Nonparametric Bayesian modeling for non-normal data through a transformation[J]. AIMS Mathematics, 2024, 9(7): 18103-18116. doi: 10.3934/math.2024883
In many applications, modeling based on a normal kernel is preferred because not only does the normal kernel belong to the family of stable distributions, but also it is easy to satisfy the stationary condition in the stochastic process. However, the characteristic of the data, such as count or proportion, is a major obstacle to complete modeling based on a normal distribution. To solve a limited boundary or non-normal distribution problem, we provided a novel transformation method and proposed a nonparametric Bayesian approach based on a normal kernel of the transformed variable. In particular, the provided transformation transforms any probability space into a real space and is free from the constraints of the previous transformation, such as skewness, presence of power, and bounded domains. Another advantage was that it was possible to use the Dirichlet process mixture model with full conditional posterior distributions for all parameters, leading to a fast convergence rate in the Markov chain Monte Carlo. The proposed methodology was illustrated with simulated datasets and two real datasets with non-normal distribution problems. In addition, to demonstrate the superiority of the proposed methodology, the comparison with the transformed Bernstein polynomial model was made in the real data analysis.
[1] | J. Aitchison, S. M. Shen, Logistic-normal distributions: some properties and uses, Biometrika, 67 (1980), 261–272. https://doi.org/10.1093/biomet/67.2.261 doi: 10.1093/biomet/67.2.261 |
[2] | G. E. P. Box, D. R. Cox, An analysis of transformations, J. Royal Stat. Soc. B, 26 (1964), 211–252. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x doi: 10.1111/j.2517-6161.1964.tb00553.x |
[3] | P. K. Dunn, G. K. Smyth, Randomized quantile residuals, J. Comput. Graph. Stat., 5 (1996), 236–244. https://doi.org/10.1080/10618600.1996.10474708 doi: 10.1080/10618600.1996.10474708 |
[4] | M. D. Escobar, Estimating normal means with a Dirichlet process prior, J. Amer. Stat. Assoc., 89 (1994), 268–277. https://doi.org/10.2307/2291223 doi: 10.2307/2291223 |
[5] | M. D. Escobar, M. West, Bayesian density estimation and inference using mixtures, J. Amer. Stat. Assoc., 90 (1995), 577–588. https://doi.org/10.2307/2291069 doi: 10.2307/2291069 |
[6] | T. S. Ferguson, Prior distributions on spaces of probability measures, Ann. Statist., 2 (1974), 615–629. https://doi.org/10.1214/aos/1176342752 doi: 10.1214/aos/1176342752 |
[7] | R. Henderson, S. Shimakura, D. Gorst, Modeling spatial variation in leukemia survival data, J. Amer. Stat. Assoc., 97 (2002), 965–972. https://doi.org/10.1198/016214502388618753 doi: 10.1198/016214502388618753 |
[8] | H. Ishwaran, L. F. James, Gibbs sampling methods for stick-breaking priors, J. Amer. Stat. Assoc., 96 (2001), 161–173. https://doi.org/10.1198/016214501750332758 doi: 10.1198/016214501750332758 |
[9] | J. A. John, N. R. Draper, An alternative family of transformations, J. Royal Stat. Soc. C Appl. Statist., 29 (1980), 190–197. https://doi.org/10.2307/2986305 doi: 10.2307/2986305 |
[10] | S. N. Maceachern, Estimating normal means with a conjugate style Dirichlet process prior, Commun. Stat.-Simulat. Comput., 23 (1994), 727–741. https://doi.org/10.1080/03610919408813196 doi: 10.1080/03610919408813196 |
[11] | B. K. Mallick, A. E. Gelfand, Generalized linear models with unknown link functions, Biometrika, 81 (1994), 237–245. https://doi.org/10.1093/biomet/81.2.237 doi: 10.1093/biomet/81.2.237 |
[12] | S. Petrone, Bayesian density estimation using Bernstein polynomials, The Canadian Journal of Statistics, 27 (1999), 105–126. https://doi.org/10.2307/3315494 doi: 10.2307/3315494 |
[13] | S. Petrone, Random Bernstein polynomials, Scand. J. Stat., 26 (1999), 373–393. https://doi.org/10.1111/1467-9469.00155 doi: 10.1111/1467-9469.00155 |
[14] | Z. Yang, A modified family of power transformations, Econ. Lett., 92 (2006), 14–19. https://doi.org/10.1016/j.econlet.2006.01.011 doi: 10.1016/j.econlet.2006.01.011 |
[15] | I. K. Yeo, R. A. Johnson, A new family of power transformations to improve normality or symmetry, Biometrika, 87 (2000), 954–959. https://doi.org/10.1093/biomet/87.4.954 doi: 10.1093/biomet/87.4.954 |