Functional data analysis (FDA) is a method used to analyze data represented in its functional form. The method is particularly useful for exploring both curve and longitudinal data in both exploratory and inferential contexts, with minimal constraints on the parameters. In FDA, the choice of basis function is crucial for the smoothing process. However, traditional basis functions lack flexibility, limiting the ability to modify the shape of curves and accurately represent abnormal details in modern and complex datasets. This study introduced a novel and flexible data smoothing technique for interpreting functional data, employing the beta spline introduced by Barsky in 1981. The beta spline offers flexibility due to the inclusion of two shape parameters. The proposed methodology integrated the roughness penalty approach and generalized cross-validation (GCV) to identify the optimal curve that best fitted the data, ensuring appropriate parameters were considered for transforming data into a functional form. The effectiveness of the approach was assessed by analyzing the GCV color grid chart to determine the optimal curve. In contrast to existing methodologies, the proposed method enhanced flexibility by incorporating the beta spline into the smoothing procedure. This approach was anticipated to effectively handle various forms of time series data, offering improved interpretability and accuracy in data analysis, including forecasting.
Citation: Wan Anis Farhah Wan Amir, Md Yushalify Misro, Mohd Hafiz Mohd. Flexible functional data smoothing and optimization using beta spline[J]. AIMS Mathematics, 2024, 9(9): 23158-23181. doi: 10.3934/math.20241126
Functional data analysis (FDA) is a method used to analyze data represented in its functional form. The method is particularly useful for exploring both curve and longitudinal data in both exploratory and inferential contexts, with minimal constraints on the parameters. In FDA, the choice of basis function is crucial for the smoothing process. However, traditional basis functions lack flexibility, limiting the ability to modify the shape of curves and accurately represent abnormal details in modern and complex datasets. This study introduced a novel and flexible data smoothing technique for interpreting functional data, employing the beta spline introduced by Barsky in 1981. The beta spline offers flexibility due to the inclusion of two shape parameters. The proposed methodology integrated the roughness penalty approach and generalized cross-validation (GCV) to identify the optimal curve that best fitted the data, ensuring appropriate parameters were considered for transforming data into a functional form. The effectiveness of the approach was assessed by analyzing the GCV color grid chart to determine the optimal curve. In contrast to existing methodologies, the proposed method enhanced flexibility by incorporating the beta spline into the smoothing procedure. This approach was anticipated to effectively handle various forms of time series data, offering improved interpretability and accuracy in data analysis, including forecasting.
[1] | Y. Xu, Functional Data Analysis, London: Springer, 2023. https://doi.org/10.1007/978-1-4471-7503-2_4 |
[2] | P, Hall, M, Hosseini-Nasab, On Properties of Functional Principal Components Analysis, J. R. Stat. Soc. Ser. B: Stat. Methodol., 68 (2006), 109–126. https://doi.org/10.1111/j.1467-9868.2005.00535.x doi: 10.1111/j.1467-9868.2005.00535.x |
[3] | W. Seo, Functional principal component analysis for cointegrated functional time series, J. Time Ser. Anal., 45 (2023), 320–330. https://doi.org/10.1111/jtsa.12707 doi: 10.1111/jtsa.12707 |
[4] | O. A. Montesinos López, A. Montesinos López, J. Crossa, Multivariate Statistical Machine Learning Methods for Genomic Prediction, Cham: Springer, 2022. https://doi.org/10.1007/978-3-030-89010-0 |
[5] | H. Hullait, D. S. Leslie, N. G. Pavlidis, S. King, Robust Function-on-Function Regression, Technometrics, 63 (2020), 396–409. https://doi.org/10.1080/00401706.2020.1802350 doi: 10.1080/00401706.2020.1802350 |
[6] | J. O. Razo-De-Anda, L. L. Romero-Castro, F. Venegas-Martínez, Contagion Patterns Classification in Stock Indices: A Functional Clustering Analysis Using Decision Trees, Mathematics, 11 (2023), 2961. https://doi.org/10.3390/math11132961 doi: 10.3390/math11132961 |
[7] | F. Centofanti, A. Lepore, B. Palumbo, Sparse and smooth functional data clustering, Stat. Pap., 65 (2024), 795–825. https://doi.org/10.1007/s00362-023-01408-1 doi: 10.1007/s00362-023-01408-1 |
[8] | J. A. Arias-López, C. Cadarso-Suárez, P. Aguiar-Fernánde, Computational Issues in the Application of Functional Data Analysis to Imaging Data, Lect. Notes Comput. Sci., 42 (2021), 630–638. https://doi.org/10.1007/978-3-030-86960-1_46 doi: 10.1007/978-3-030-86960-1_46 |
[9] | C. Tang, T. Wang, P. Zhang, Functional data analysis: An application to COVID‐19 data in the United States in 2020, Quant. Bio., 10 (2022), 172–187. https://doi.org/10.15302/J-QB-022-0300 doi: 10.15302/J-QB-022-0300 |
[10] | C. Zhang, H. Lin, L. Liu, J. Liu, Y. Li, Functional Data Analysis with Covariate-Dependent Mean and Covariance Structures, Biometrics, 79 (2023), 2232–2245. https://doi.org/10.1111/biom.13744 doi: 10.1111/biom.13744 |
[11] | I. Shah, P. Mubassir, S. Ali, O. Albalawi, A functional autoregressive approach for modeling and forecasting short-term air temperature, Front. Environ. Sci., 12 (2024), 1411237. https://doi.org/10.3389/fenvs.2024.1411237 doi: 10.3389/fenvs.2024.1411237 |
[12] | V. Villani, E. Romano, J. Mateu, Climate model selection via conformal clustering of spatial functional data, Environ. Ecol. Stat., 31 (2024), 365–385. https://doi.org/10.1007/s10651-024-00616-8 doi: 10.1007/s10651-024-00616-8 |
[13] | A. Palummo, E. Arnone, L. Formaggia, L. M. Sangalli, Functional principal component analysis for incomplete space-time data, Environ. Ecol. Stat., 31 (2024), 555–582. https://doi.org/10.1007/s10651-024-00598-7 doi: 10.1007/s10651-024-00598-7 |
[14] | J. O. Ramsay, B. W. Silverman, Functional Data Analysis, 2 Eds., New York: Springer, 2005. https://doi.org/10.1007/b98888 |
[15] | M. A. Hael, Unveiling air pollution patterns in Yemen: A spatial-temporal functional data analysis, Environ. Sci. Pollut. Res., 30 (2023), 50067–50095. https://doi.org/10.1007/s11356-023-25790-3 doi: 10.1007/s11356-023-25790-3 |
[16] | M. Gong, R. O'Donnell, C. Miller, M. Scott, S. Simis, S. Groom, et. al, Adaptive smoothing to identify spatial structure in global lake ecological processes using satellite remote sensing data, Spat. Stat., 50 (2022), 100615. https://doi.org/10.1016/j.spasta.2022.100615 doi: 10.1016/j.spasta.2022.100615 |
[17] | R. Raturi, Large Data Analysis via Interpolation of Functions: Interpolating Polynomials vs Artificial Neural Networks, Amer. J. Intell. Syst., 8 (2018), 6–11. https://doi.org/10.5923/j.ajis.20180801.02 doi: 10.5923/j.ajis.20180801.02 |
[18] | N. A. Mazelan, J. Suhaila, Exploring rainfall variabilities using statistical functional data analysis, IOP Conf. Ser.: Earth Environ. Sci., 1167 (2023), 012007. https://doi.org/10.1088/1755-1315/1167/1/012007 doi: 10.1088/1755-1315/1167/1/012007 |
[19] | C. Sözen, Y. Öner, The investigation of temperature data in Turkey's Black Sea Region using functional data analysis, J. Appl. Stat., 49 (2021), 2403–2415. https://doi.org/10.1080/02664763.2021.1896683 doi: 10.1080/02664763.2021.1896683 |
[20] | J. Baz, J. Davis, L. Han, C. Stracke, The value of smoothing, J. Portfolio Manag., 48 (2022), 73–85. https://doi.org/10.3905/jpm.2022.1.399 doi: 10.3905/jpm.2022.1.399 |
[21] | A. Falini, F. Mazzia, C. Tamborrino, Spline based Hermite quasi-interpolation for univariate time series, Discrete Cont. Dyn. Syst. - S, 15 (2022), 3667–3688. https://doi.org/10.3934/dcdss.2022039 doi: 10.3934/dcdss.2022039 |
[22] | L. Brugnano, D. Giordano, F. Iavernaro, G. Rubino, An entropy-based approach for a robust least squares spline approximation, J. Comput. Appl. Math., 443 (2024), 115773. https://doi.org/10.1016/j.cam.2024.115773 doi: 10.1016/j.cam.2024.115773 |
[23] | M. Spreafico, F. Ieva, M. Fiocco, Modelling time-varying covariates effect on survival via functional data analysis: Application to the MRC BO06 trial in osteosarcoma, Stat. Methods Appl., 32 (2023), 271–298. https://doi.org/10.1007/s10260-022-00647-0 doi: 10.1007/s10260-022-00647-0 |
[24] | A. Rahman, D. Jiang, Regional and temporal patterns of influenza: Application of functional data analysis, Infect. Dis. Modell., 6 (2021), 1061–1072. https://doi.org/10.1016/j.idm.2021.08.006 doi: 10.1016/j.idm.2021.08.006 |
[25] | M. Rangata, S. Das, M. Ali, Analysing Maximum Monthly Temperatures in South Africa for 45 years Using Functional Data Analysis, Adv. Decis. Sci., 24 (2020), 1–27. |
[26] | U. Beyaztas, S. Q. Salih, K.-W. Chau, N. Al-Ansari, Z. M. Yaseen, Construction of functional data analysis modeling strategy for global solar radiation prediction: Application of cross-station paradigm, Eng. Appl. Comput. Fluid Mech., 13 (2019), 1165–1181. http://doi.org/10.1080/19942060.2019.1676314 doi: 10.1080/19942060.2019.1676314 |
[27] | S. Curceac, C. Ternynck, T. B. Ouarda, F. Chebana, S. D. Niang, Short-term air temperature forecasting using Nonparametric Functional Data Analysis and SARMA models, Environ. Modell. Software, 111 (2019), 394–408. http://doi.org/10.1016/j.envsoft.2018.09.017 doi: 10.1016/j.envsoft.2018.09.017 |
[28] | M. Ammad, M. Y. Misro, A. Ramli, A novel generalized trigonometric Bézier curve: Properties, continuity conditions and applications to the curve modeling, J. Amer. Math. Soc., 194 (2022), 744–763. http://doi.org/10.1016/j.matcom.2021.12.011 doi: 10.1016/j.matcom.2021.12.011 |
[29] | S. A. A. A. Said Mad Zain, M. Y. Misro, K. T. Miura, Generalized Fractional Bézier Curve with Shape Parameters, Mathematics, 9 (2021), 2141. https://doi.org/10.3390/math9172141 doi: 10.3390/math9172141 |
[30] | B. A. Barsky, The Beta-Spline: A Local Representation based on Shape Parameters and Fundamental Geometric Measures, PhD thesis, The University of Utah, 1981. |
[31] | B. A. Barsky, Rational Beta-splines for representing curves and surfaces, IEEE Comput. Graph. Appl., 13 (1993), 24–32. http://doi.org/10.1109/38.252550 doi: 10.1109/38.252550 |
[32] | N. A. Hadi, A. Ibrahim, F. Yahya, J. M. Ali, A Comparative Study on Cubic Bezier and Beta-Spline Curves, Mathematika, 29 (2013), 55–64. |
[33] | B. Sambhunath, C. L. Brian, Bézier and Splines in Image Processing and Machine Vision, London: Springer, 2008. https://doi.org/10.1007/978-1-84628-957-6 |
[34] | N. A. Hadi, N. S. M. Kamal, H. Nordin, Computational Method for Digital Khat Calligraphy Using Beta-Spline Curve Fitting, ASM Sc. J., 13 (2020). https://doi.org/10.32802/asmscj.2020.sm26(5.8) doi: 10.32802/asmscj.2020.sm26(5.8) |
[35] | S. A. Suliman, N. A. Hadi, Optimizing the Shape Parameters of Beta-Spline Using Particle Swarm Optimization, Int. J. Eng. Technol., 7 (2018), 93–97. http://doi.org/10.14419/ijet.v7i4.33.23492 doi: 10.14419/ijet.v7i4.33.23492 |
[36] | M. S. A. Halim, N. A. Hadi, H. Sulaiman, S. Abd Halim, An algorithm for beta-spline surface reconstruction from multi slice CT scan images using MATLAB pmode, 2017 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), 2017, 1–6. http://doi.org/10.1109/ISCAIE.2017.8074939 doi: 10.1109/ISCAIE.2017.8074939 |
[37] | B. A. Barsky, J. C. Beatty, Local Control of Bias and Tension in Beta-splines, ACM Trans. Graph., 2 (1983), 109–134. http://doi.org/10.1145/357318.357321 doi: 10.1145/357318.357321 |
[38] | B. A. Barsky, Computer Graphics and Geometric Modeling Using Beta-splines, Berlin, Heidelberg: Springer, 1988. https://doi.org/10.1007/978-3-642-72292-9 |
[39] | B. A. Barsky, J. C. Beatty, Varying the Betas in Beta-splines, Technical Report UCB/CSD-83-112, EECS Department, University of California, Berkeley, 1982. Available from: https://digicoll.lib.berkeley.edu/record/137388/files/CSD-83-112.pdf. |
[40] | E. Holtanová, T. Mendlik, J. Koláček, I. Horová, J. Mikšovský, Similarities within a multi-model ensemble: functional data analysis framework, Geosci. Model Dev., 12 (2019), 735–747. http://doi.org/10.5194/gmd-12-735-2019 doi: 10.5194/gmd-12-735-2019 |
[41] | D. A. Shah, E. D. De Wolf, P. A. Paul, L. V. Madden, Functional Data Analysis of Weather Variables Linked to Fusarium Head Blight Epidemics in the United States, Phytopathology®, 109 (2019), 96–110. http://doi.org/10.1094/PHYTO-11-17-0386-R doi: 10.1094/PHYTO-11-17-0386-R |
[42] | B. Guo, H. Wu, L. Pei, X. Zhu, D. Zhang, Y. Wang, et al., Study on the spatiotemporal dynamic of ground-level ozone concentrations on multiple scales across China during the blue sky protection campaign, Environ. Int., 170 (2022), 107606. http://doi.org/10.1016/j.envint.2022.107606 doi: 10.1016/j.envint.2022.107606 |
[43] | P. Craven, G. Wahba, Smoothing noisy data with spline functions, Numer. Math., 31 (1978), 377–403. http://doi.org/10.1007/BF01404567 doi: 10.1007/BF01404567 |
[44] | M. Gubian, F. Torreira, L. Boves, Using Functional Data Analysis for investigating multidimensional dynamic phonetic contrasts, J. Phonetics, 49 (2015), 16–40. http://doi.org/10.1016/j.wocn.2014.10.001 doi: 10.1016/j.wocn.2014.10.001 |
[45] | L. Tavi, T. Kinnunen, R. González Hautamäki, Improving speaker de-identification with functional data analysis of f0 trajectories, Speech Commun, , 140 (2022), 1–10. http://doi.org/10.1016/j.specom.2022.03.010 doi: 10.1016/j.specom.2022.03.010 |