The TikTok Addiction Scale: Development and validation

Petros Galanis; Aglaia Katsiroumpa; Ioannis Moisoglou; Olympia Konstantakopoulou; Petros Galanis; Aglaia Katsiroumpa; Ioannis Moisoglou; Olympia Konstantakopoulou

doi:10.3934/publichealth.2024061

AIMS Public Health

2024, Volume 11, Issue 4: 1172-1197. doi: 10.3934/publichealth.2024061

Previous Article Next Article

Research article Special Issues

The TikTok Addiction Scale: Development and validation

1.
Clinical Epidemiology Laboratory, Faculty of Nursing, National and Kapodistrian University of Athens, Athens, Greece
2.
Faculty of Nursing, University of Thessaly, Larissa, Greece

Received: 24 September 2024 Revised: 08 November 2024 Accepted: 09 December 2024 Published: 11 December 2024

Background

There is an absence of valid and specific psychometric tools to assess TikTok addiction. Considering that the use of TikTok is increasing rapidly and the fact that TikTok addiction may be a different form of social media addiction, there is an urge for a valid tool to measure TikTok addiction.

Objective

To develop and validate a tool to measure TikTok addiction.

Methods

First, we performed an extensive literature review to create a pool of items to measure TikTok addiction. Then, we employed a panel of experts from different backgrounds to examine the content validity of the initial set of items. We examined face validity by performing cognitive interviews with TikTok users and calculating the item-level face validity index. Our study population included 429 adults who have been TikTok users for at least the last 12 months. We employed exploratory and confirmatory factor analysis to examine the construct validity of the TikTok Addiction Scale (TTAS). We examined the concurrent validity by using the Bergen Social Media Addiction Scale (BSMAS), the Patient Health Questionnaire-4 (PHQ-4), and the Big Five Inventory-10 (BFI-10). We used Cronbach's alpha, McDonald's Omega, Cohen's kappa, and intraclass correlation coefficient to examine reliability.

Results

We found that the TTAS is a six-factor 15-item scale with robust psychometric properties. Factor analysis revealed a six-factor structure, (1) salience, (2) mood modification, (3) tolerance, (4) withdrawal symptoms, (5) conflict, and (6) relapse, which accounted for 80.70% of the total variance. The concurrent validity of the TTAS was excellent since we found significant correlations between TTAS and BSMAS, PHQ-4, and BFI-10. Cronbach's alpha and McDonald's Omega for the TTAS were 0.911 and 0.914, respectively.

Conclusion

The TTAS appears to be a short, easy-to-use, and valid scale to measure TikTok addiction. Considering the limitations of our study, we recommend the translation and validation of the TTAS in other languages and populations to further examine the validity of the scale.

Keywords:

Citation: Petros Galanis, Aglaia Katsiroumpa, Ioannis Moisoglou, Olympia Konstantakopoulou. The TikTok Addiction Scale: Development and validation[J]. AIMS Public Health, 2024, 11(4): 1172-1197. doi: 10.3934/publichealth.2024061

Related Papers:

[1]	Lili Jiang, Sirong Chen, Yuanhui Wu, Da Zhou, Lihua Duan . Prediction of coronary heart disease in gout patients using machine learning models. Mathematical Biosciences and Engineering, 2023, 20(3): 4574-4591. doi: 10.3934/mbe.2023212
[2]	Natalya Shakhovska, Vitaliy Yakovyna, Valentyna Chopyak . A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system. Mathematical Biosciences and Engineering, 2022, 19(6): 6102-6123. doi: 10.3934/mbe.2022285
[3]	Hangle Hu, Chunlei Cheng, Qing Ye, Lin Peng, Youzhi Shen . Enhancing traditional Chinese medicine diagnostics: Integrating ontological knowledge for multi-label symptom entity classification. Mathematical Biosciences and Engineering, 2024, 21(1): 369-391. doi: 10.3934/mbe.2024017
[4]	Faisal Mehmood Butt, Lal Hussain, Anzar Mahmood, Kashif Javed Lone . Artificial Intelligence based accurately load forecasting system to forecast short and medium-term load demands. Mathematical Biosciences and Engineering, 2021, 18(1): 400-425. doi: 10.3934/mbe.2021022
[5]	Xin Jing, Jungang Luo, Shangyao Zhang, Na Wei . Runoff forecasting model based on variational mode decomposition and artificial neural networks. Mathematical Biosciences and Engineering, 2022, 19(2): 1633-1648. doi: 10.3934/mbe.2022076
[6]	Pinpin Qin, Xing Li, Shenglin Bin, Fumao Wu, Yanzhi Pang . Research on transformer and long short-term memory neural network car-following model considering data loss. Mathematical Biosciences and Engineering, 2023, 20(11): 19617-19635. doi: 10.3934/mbe.2023869
[7]	Wajid Aziz, Lal Hussain, Ishtiaq Rasool Khan, Jalal S. Alowibdi, Monagi H. Alkinani . Machine learning based classification of normal, slow and fast walking by extracting multimodal features from stride interval time series. Mathematical Biosciences and Engineering, 2021, 18(1): 495-517. doi: 10.3934/mbe.2021027
[8]	Peng Lu, Ao Sun, Mingyu Xu, Zhenhua Wang, Zongsheng Zheng, Yating Xie, Wenjuan Wang . A time series image prediction method combining a CNN and LSTM and its application in typhoon track prediction. Mathematical Biosciences and Engineering, 2022, 19(12): 12260-12278. doi: 10.3934/mbe.2022571
[9]	Feng Wang, Shan Chang, Dashun Wei . Prediction of conotoxin type based on long short-term memory network. Mathematical Biosciences and Engineering, 2021, 18(5): 6700-6708. doi: 10.3934/mbe.2021332
[10]	Yu-Mei Han, Hui Yang, Qin-Lai Huang, Zi-Jie Sun, Ming-Liang Li, Jing-Bo Zhang, Ke-Jun Deng, Shuo Chen, Hao Lin . Risk prediction of diabetes and pre-diabetes based on physical examination data. Mathematical Biosciences and Engineering, 2022, 19(4): 3597-3608. doi: 10.3934/mbe.2022166

Abstract

Background

Objective

To develop and validate a tool to measure TikTok addiction.

Methods

Results

Conclusion

1. Introduction

Rainfall has played an important role in the development and maintenance of human civilizations. Rain is one of the most important sources of pure water on which humans depend for life. Rain replenishes groundwater, which is the main source of drinking water. Since more than 50% of Australia's land mass is used for agriculture, an accurate rainfall forecasting system can help farmers plan cropping operations, i.e., when to sow seeds, apply fertilizers, and harvest crops. Rainfall prediction ^[1] can also help farmers decide which crops to plant for maximum harvests and profits. In addition, precipitation plays an important role in the planning and maintenance of water reservoirs, such as dams that generate electricity from hydropower. About half of the renewable energy generated by more than 120 hydropower plants in Australia comes from precipitation. With accurate rainfall forecasts, operators are well informed about when to store water and when to release it to avoid flooding or drought conditions in places with low rainfall. Precipitation forecasts also play a critical role in the aviation industry, from the moment an aircraft starts its engine. An accurate precipitation forecast helps plan flight routes and suggests the right time to take off and land a flight to ensure physical and economic safety. After all, aircraft operations can be seriously affected by lightning, icing, turbulence, thunderstorm activity and more. According to ^[2], climate is a major factor in aviation accidents, accounting for 23% of accidents worldwide.

Numerous studies have shown that the duration and intensity of rainfall can cause major weather-related disasters such as floods and droughts. AON's annual weather report shows that seasonal flooding in China from June to September 2020 resulted in an estimated economic loss of 35 billion and a large number of deaths ^[3]. In addition, rainfall also has a negative impact on the mining industry, as heavy and unpredictable rainfall can affect mining activities. For example, the Bowen Basin in Queensland hosts some of Australia's largest coal reserves. The summer rains of 2010–2011 severely impacted mining operations. An estimated 85% of coal mines in Queensland had their operations disrupted as a result (Queensland Flood Commission, 2012) ^[4,5]. As of May 2011, the Queensland coal mining sector had recovered only 75% of its pre-flood production and lost $5.7$ billion. As a result, rainfall forecasts are becoming increasingly important in developing preventive measures to minimize the impact of such disasters.

Predicting rainfall is challenging because it involves the study of various natural phenomena such as temperature, humidity, wind speed, wind direction, cloud cover, sunlight, and more. Therefore, accurate rainfall forecasts are critical in areas such as energy and agriculture. A report produced by Australia's National Climate Change Adaptation Research Facility examined the impacts of extreme weather events. It states that currently available weather forecasts for industry are inadequate. It lacks location information and other details that enable risk management and targeted planning. Traditional weather forecasts use various hardware parameters to predict parameters and use mathematical calculations to predict heavy rainfall, which are sometimes inaccurate and therefore cannot work effectively. The Australian Bureau of Meteorology currently uses the Australian Predictive Ocean Atmosphere Model (POAMA) forecasts to predict rainfall patterns ^[6]. POAMA is a standard distribution model used for many weeks to specific seasons to look at weather throughout the year. It uses surveys of ocean, atmospheric, ice, and Earth data to develop ideas for up to nine months. In this work, we use machine learning and deep learning methods ^[7,8] based on the analysis of complex patterns based on historical data to effectively and accurately predict the occurrence of rainfall. The application of this method requires accurate historical data, the presence of patterns that can be detected, and their continuation into the future where predictions are sought.

Several divisive algorithms such as Random Forest ^[9], Naive Bayes ^[10], Logistic Regression ^[11], Decision Tree ^[12], XGBoost ^[13], and others have been studied for rainfall prediction. However, the effectiveness of these algorithms varies depending on a combination of preprocessing and data cleaning techniques, feature scaling, data normalization, training parameters, and segmentation testing, leaving room for improvement. The goal of this paper is to provide a customized set of these techniques to train machine learning ^[14] and deep learning ^[15] models that provide the most accurate results for rainfall prediction. The models are trained and tested on the Australian rainfall database using the proposed approach. The database contains records from 49 metropolitan areas over a 10-year period starting December 1, 2008. The research contributions of the proposed work are as follows:

$1)$ To remove outliers using Inter Quartile Range (IQR).

$2)$ To balance the data using Synthetic Minority Oversampling Technique (SMOTE) technique.

$3)$ To apply both classification and regression models which at first predict whether it rain or not and if there is rain then find the amount of rain.

$4)$ To apply XGBoost, Random Forest, Kernel SVM, and Long-Short Term Memory (LSTM) for the classification task.

$5)$ To apply Multiple Linear Regressors, XGBoost, Polynomial Regressor, Random Forest Regressor, and LSTM for the regression task.

2. Literature review

Luk et al. ^[16] addressed watershed management and flood control. The goal was to accurately predict the temporal and local distribution of rainfall and the amount of water and quality management. The dataset used to train the ML models was collected from the Upper Parramatta River Catchment Trust (UPRCT), Sydney. Three methods were used for modeling features related to rainfall prediction, namely MultiLayer FeedForward Network (MLFN), Partial Recurrent Neural Network (PRNN) and Time Delay Neural Network (TDNN). The main parameters for the above methods were lag, window size and number of hidden nodes.

Abhishek et al. ^[17] worked on developing ANN -based effective and nonlinear models for accurate prediction of maximum temperature 365 days a year. The data used were from the Toronto Lester B. Pearson Int'l A station, Ontario, Canada from 1999–2009. They proposed two models and trained them using the Levenberg-Marquardt algorithm with 5 hidden layers and a model with 10 hidden layers. Factors that affected the results were the number of neurons, sampling, hidden layers, transfer function, and overfitting.

Abhishek et al. ^[18] performed a regression task to predict average rainfall using a feed forward network trained with the back propagation algorithm, the layer recurrent network, and the feed forward network trained with the cascaded back propagation algorithm for a large number of neurons. The data were collected from www.Indiastat.com and the IMD website. The dataset contains records for the months of April to November from 1960 to 2010 in Udupi district of Karnataka.

Saba et al. ^[19] worked on accurate weather predictions using a hybrid neural network model combining MultiLayer Perceptron (MLP) and Radial Basis Function (RBF). The dataset used was from the weather station in Saudi Arabia. They proposed an extended hybrid neural network approach and compared the results of individual neural networks with those of hybrid neural networks. The results showed that hybrid neural network models have greater learning ability and better generalization ability for certain sets of inputs and nodes.

Biswas et al. ^[20] focused on the prediction of weather conditions (good or bad) using the classification method. Naive Bayes and chi-square algorithm were used for classification. The main objective was to show that data mining approaches are sufficient for weather prediction. Data was obtained in real time from users and stored in a database. The decision tree generated from the training features is used for classification.

Basha et al. ^[21] introduced a Machine and Deep Learning-based rain prediction model. This model utilizes the Kaggle dataset to train various models, including the Support Vector Regressor, Autoregressive Integrated Moving Average (ARIMA), and Neural Network. The authors claim that the performance of the model, as measured by the Root Mean Squared Error (RSME), is 72%.

Doroshenko et al. ^[22] worked on refining numerical weather forecasts using a neural network by error to increase the accuracy of the additional 2m weather forecasts of the regional model COSMO. The dataset was obtained from the Kiev weather station in Ukraine. The authors chose the gated recurrent unit (GRU) approach because the error in the weather forecast is a time series and it also has fewer parameters than LSTM. When a lower error history is chosen, better fitting and refinement of the model is possible.

Appiah-Badu et al. ^[23] conducted a study to predict the occurrence of rainfall through classification. They employed several classification algorithms, including Decision Tree (DT), Random Forest (RF), Multilayer Perceptron (MLP), Extreme Gradient Boosting (XGB), and K-Nearest Neighbor (KNN). The data for the study was collected from the Ghana Meteorological Agency from 1980 to 2019 and was divided into four ecological zones: Coastal, Forest, Transitional, and Savannah.

Raval et al. ^[24] worked on a classification task to predict tomorrow's rain using logistic regression, LDA, KNN, and many other models and compared their metrics. They used a dataset containing daily 10-year weather forecasts from most Australian weather stations. It was found that deep learning models produced the best results.

Ridwan et al. ^[25] proposed a rainfall prediction model for Malaysia. The model was trained using a dataset of ten stations and employs both a Neural Network Regressor and Decision Forest Regression (DFR). The authors claim that the R2 score ranges from 0.5 to 0.9. This approach only predicts rainfall and does not perform any classification tasks.

Adaryani et al. ^[26] conducted an analysis of short-term rainfall forecasting for applications in hydrologic modeling and flood warning. They compared the performance of PSO Support Vector Regression (PSO-SVR), Long Short-Term Memory (LSTM), and Convolutional Neural Network (CNN). The study considered 5-minute and 15-minute ahead forecast models of rainfall depth based on data from the Niavaran station in Tehran, Iran.

Fahad et al. ^[27] conducted a study on forecasting rainfall through the use of a deep forecasting model based on Gated Recurrent Unit (GRU) Neural Network. The study analyzed 30 years of climatic data (1991–2020) in Pakistan, considering both positive and negative impacts of temperature and gas emissions on rainfall. The findings of the study have potential implications for disaster management institutions.

Tables 1 and 2 compare various state-of-the-art approaches for classification as well as regression tasks respectively.

Table 1. Literature review for classification.

Authors	Dataset	Approach used	Best performance
Raval et al. (2021) ^[24]	Daily weather observations from several Australian weather stations for 10 years	Logistic regression, Linear discriminant analysis, Quadratic discriminant analysis, K-Nearest neighbor, Decision tree, Gradient boosting, Random forest, Bernoulli Naïve Bayes, Deep learning model	Precision = 98.26 F1-Score = 88.61
Appiah-Badu et al. (2021) ^[23]	Data from the 22 synoptic stations across the four ecological zones of Ghana from 1980 – 2019	Decision tree, Multilayer perceptron, Random forest, Extreme gradient boosting, K-Nearest neighbor	Precision = 100 Recall = 96.03 F1-Score = 97.98

| Show Table

DownLoad: CSV

Table 2. Literature review for regression.

Authors	Dataset	Approach used	Limitations	Best performance
Luk et al. (2001) ^[16]	Dataset is collected from the Upper Parramatta River Catchment Trust (UPRCT), Sydney	Multi-Layer Feedforward Network (MLFN), Partial Recurrent Neural Network (PRNN), Time Delay Neural Network	Only used the regression model to predict the amount of rainfall	NMSE = 0.63
Abhishek et al. (2012) ^[17]	Data available for the station Toronto Lester B. Pearson Int'l A, Ontario, Canada, 1999-2009	Single layer model, 5 hidden layer model, 10 hidden layer model	Not used any sequential model to capture the time series nature of data.	MSE = 2.75
Saba et al. (2017) ^[19]	Data used from Saudia Arabian Weather Forecasting Station	Hybrid Model (MLP+RBF)	Not used any time series model, only regression model to predict the amount of rainfall	Correlation coefficient = 0.95, RMSE = 146, Scatter Index = 0.61
Basha et al. (2020) ^[21]	Dataset chosen is the Kaggle dataset for the rainfall prediction	Support vector regressor, AutoRegressive Integrated Moving Average (ARIMA), and Neural Network	Trained on a small dataset, no oversampling techniques are used to increase the size of dataset	RSME = 0.72

| Show Table

DownLoad: CSV

Here, NMSE stands for Normalized Mean Square Error, which allows us to compare the error across sets with different value ranges. Using simple Mean Squared Error (MSE) can result in higher variance for sets with larger values, even if the variance for sets with smaller values is actually greater. For example, if set 1 contains elements with values ranging from 1–100 and set 2 contains elements with values ranging from 1000–10,000, the variance for set 2 will be higher if MSE is used, even if the variance for set 1 is actually greater. NMSE is used to compare the error across sets by dividing the entire set by the maximum value in the range, resulting in a conversion of both sets' ranges to 0-ik -1 for better comparison.

3. Data description

The Dataset we used for our study contains data from daily observations over a tenure of 10 years starting from 1/12/2008 up till 26/04/2017 from 49 different locations over Australia ^[28]. The dataset contains 23 features which are Date, Location, MinTemp, MaxTemp, Rainfall, Evaporation, Sunshine, WindGustDir, WindGustSpeed, WindDir9am, WindDir3pm, WindSpeed9am, WindSpeed3pm, Humidity9am, Humidity3pm, Pressure9am, Pressure3pm, Cloud9am, Cloud3pm, Temp9am, Temp3pm, RainToday and RainTomorrow. The dataset contains around 145 thousand entries.

For the classification task, RainTomorrow is the target variable that predicts the occurrence of rainfall on the next day. Here, 0 indicates no rain, and 1 indicates chance of rainfall. For the regression task, Rainfall is the target variable that forecasts the amount of precipitation in millimeters. We performed exploratory data analysis on the dataset, which is the key to machine learning problems in order to gain maximum confidence in the validity of future results. This analysis helps us to look for anomalies in the data, figure out correlations between features and check for missing values to enhance the outcomes of the machine learning models. Table 3 presents the analysis of null values in the raw dataset. Here, it is visible that most of the attributes contain null values which need to be addressed carefully before passing the data to train the model, otherwise the model will not give accurate predictions.

Table 3. Null values of attributes in the dataset.

Attribute	Null Value	Attribute	Null value	Attribute	Null value
Date	0.0% missing values	WindGustSpeed	7.06% missing values	Pressure3pm	10.33% missing values
Location	0.0% missing values	WindDir9am	7.26% missing values	Cloud9am	38.42% missing values
MinTemp	1.02% missing values	WindDir3pm	2.91% missing values	Cloud3pm	40.81% missing values
MaxTemp	0.87% missing values	WindSpeed9am	1.21% missing values	Temp9am	1.21% missing values
Rainfall	2.24% missing values	WindSpeed3pm	2.11% missing values	Temp3pm	2.48% missing values
Evaporation	43.17% missing values	Humidity9am	1.82% missing values	RainToday	2.24% missing values
Sunshine	48.01% missing values	Humidity3pm	3.1% missing values	RainTomorrow	2.25% missing values
WindGustDir	7.1% missing values	Pressure9am	10.36% missing values

| Show Table

DownLoad: CSV

Figure 1 presents a correlation matrix that states the correlation coefficient between two features, i.e., how much the features are correlated to each other. The scale of the correlation matrix is from –1 to 1, where 1 represents the perfect positive relationship between the two factors and –1 represents the total negative relationship between the two factors. A correlation coefficient of 0 represents an absence of a relation between the two variables.

Figure 1. Correlation matrix for various features in the dataset.

DownLoad: Full-Size Img PowerPoint

From the analysis of the null values, it appears that the attributes Evaporation, Sunshine, Cloud9am, and Cloud3pm contain almost 50% NaN values. Therefore, we have discarded these 4 columns and did not use them for training our model. This is because even if we use one of the available techniques to populate the data, it might differ from the performance of the model. The actual weather data may not match the padded data, which could affect the learning process of the model. Figure 1 presents a correlation matrix that shows the correlation coefficient between two features. As we removed four features from our dataset, they will not be included in our correlation matrix calculation. The Date feature is categorized into its respective months, and the month feature is taken into consideration for better season-wise categorization of data and the study of correlation. The reason for ignoring the Location feature is that the dataset is already classified into various locations. Therefore, finding a correlation between Location and Date is not beneficial in the data analysis. Additionally, the feature RainToday is ignored because the Rainfall feature, which attributes RainToday, is already included in the study of correlation.

Figure 2 displays the distribution of numerical data based on the 0th percentile, 25th percentile (1st quartile), 50th percentile (2nd quartile), 75th percentile (3rd quartile), and 100th percentile. This distribution provides insight into the presence of outliers, which must be eliminated prior to training our predictive model to achieve accurate results. In order to analyze the distribution of data with regards to various quartiles, the data must be continuous. Features such as Location, RainTomorrow, WindDir3pm, WindDir9am, WindGustDir, etc. are categorical features, therefore using a boxplot to remove outliers is only feasible with the use of continuous features for data analysis. As a result, we have used only 10 features that are continuous and have the potential for having outliers.

Figure 2. Distribution of data points w.r.t. quartile.

DownLoad: Full-Size Img PowerPoint

4. Proposed approach

Due to various factors such as global warming, deforestation, etc., affecting seasonal variables during the year, uncertainty in rainfall has become one of the most discussed topics among researchers. Therefore, the main objective of this work is to apply different techniques of data preprocessing, machine learning and deep learning models: 1) Data pre-processing to remove uncertainties and anomalies in the provided dataset. 2) Forecasting the occurrence of rainfall. 3) Projecting the amount of rainfall in millimeters. 4) Comparing the results of various models used for classification and regression purposes.

The comparison of various algorithms for the same task gives us more insights into the problem statement and helps us make decisions regarding the best model to be used for rainfall forecasting. Figure 3 illustrates the flow diagram of the proposed methodology.

Figure 3. Flow diagram of the proposed approach.

DownLoad: Full-Size Img PowerPoint

4.1. Data pre-processing

Data processing is a method of data mining that refers to the cleaning and modification of raw data collected from various sources that are suitable for the work and provide more favorable results.

4.1.1. Removing outliers

As it is evident from Figure 4 that our data contains several outliers. Thus, we employed the Inter Quartile Range (IQR) approach to remove the outliers. IQR is basically the range between the 1st and the 3rd quartile, i.e., the 25th and 75th percentile. In this approach, the data point which falls below (Q1 – 1.5 * IQR) and above (Q1 + 1.5 * IQR) are considered outliers. After removing all the outliers approximately 30 thousand rows were removed. Figure 4 represents the IQR approach employed for removing the outliers ^[29].

Figure 4. IQR approach for data cleaning.

DownLoad: Full-Size Img PowerPoint

Figures 5 and 6 show the distribution of normalized values and IQR range plot after removing the outliers from the dataset.

Figure 5. Distribution of normalized cleaned data points w.r.t. quartiles.

DownLoad: Full-Size Img PowerPoint

Figure 6. Distribution of cleaned data points w.r.t. quartiles.

DownLoad: Full-Size Img PowerPoint

4.1.2. Addressing NaN values

From the analysis of the null values, it appears that the attributes evaporation, sunshine, cloud9am and cloud3pm contain almost 50% NaN values. Therefore, we discarded these columns and did not use them for training our model. This is because even if we use one of the available techniques to populate the data, it might differ from the performance of the model. The actual weather data may not match that of the padded data, which could affect the learning process of the model.

For the remaining attributes, we filled the numeric features with the mean value of the attribute and the categorical values with the mode of each feature. However, because location and seasons also play an important role in measuring the attributes, we divided the data set into 4 seasons, namely summer (January to March), fall (March to June), winter (June to September), and spring (September to December). We then averaged the values using the month and location attributes to group the data by season and location and populated the NaN values using a similar pairwise approach. Similarly, for categorical data, the maximum occurrence of the location-season pair value is used to populate the NaN values.

For the Rainfall, RainToday, and RainTomorrow attributes, we took a different approach to handling NaN values. For the Rainfall attribute, we replaced the NaN values with 0. If NaN values were filled with mean values, the model could not generalize better. For the RainToday and RainTomorrow features, we omitted the rows with NaN values because if we fill them with the most common class values, it could affect classification precision.

4.1.3. Handling categorical variables

To train the model containing categorical features, it needs to be converted into a numerical format. For features RainToday and RainTomorrow, we used LabelEncoder that replaced the values Yes and No with 1 and 0 respectively. We could use LabelEncoder to convert directional features into numerical format but for better generalization, we replaced direction with their respective degree value.

'N': 0, 'NNE': 22.5, 'NE': 45.0, 'ENE': 67.5, 'E': 90.0, 'ESE': 112.5, 'SE': 135.0, 'SSE': 157.5, 'S': 180.0, 'SSW': 202.5, 'SW': 225.0, 'WSW': 247.5, 'W': 270.0, 'WNW': 292.5, 'NW': 315.0, 'NNW': 337.5

4.1.4. Class balancing

An important factor affecting the performance of the model is the imbalance of the output classes. If the ratio of the values of the two classes is not close to 1, the model will be biased in favor of the class whose values matter more than those of the others. One of the simplest and most effective solutions is to oversample for class imbalance using SMOTE (Synthetic Minority Oversampling Technique) ^[30]. Originally, the ratio of class 0 to class 1 frequencies was about 5:1. Because of this, the performance of the model was not better with unseen data. Figure 7(a), (b) show the bar graph of the number of values of both classes before and after balancing the classes. Class equalization is performed only for training classification models.

Figure 7. Target variable distribution (a) Before class balancing (b) After class balancing.

DownLoad: Full-Size Img PowerPoint

4.1.5. Data normalization

Scaling the data is very important in performing a regression task. By scaling our variables, we can compare different variables with the same calibration. We used a standard scaler to normalize our characteristics. The standard scaler converts the feature values to a range of –3 to 3. Equation (1) represents the equation that the standard scaler uses to scale the values.

z = \frac{(x i - μ)}{σ}

$\begin{equation} z = \frac{(xi - \mu)} {\sigma} \end{equation}$

(1)

4.1.6. Dataset capacity

After conducting exploratory data analysis and data cleaning, the dataset comprises 110 k rows and 19 features. The approach is divided into two parts: 1) forecasting the occurrence of rainfall and 2) estimating the amount of rainfall. For forecasting the occurrence of rainfall, which is a classification task, the training data consists of approximately 147 k rows and the testing data consists of 36 k rows. The number of rows is higher than the actual dataset because the SMOTE technique was applied to oversample the data and balance both classes for classification tasks. For predicting the amount of rainfall, which is a regression task, the training data consists of 87 k rows and the testing data consists of 22 k rows. In this approach, we did not use any oversampling technique for regression.

4.2. Forecasting occurrence of rainfall

The task here is to predict the occurrence of precipitation in two classes, i.e., whether it will rain tomorrow or not. The above task is to implement a classification approach using various features and their corresponding target values from a given dataset. The classification approach is divided into three parts as: 1) Preparing data for classification. 2) Fitting the training data to train a classification model, and 3) Evaluating the model performance. Figures 8 and 9 show flow of the overall implementation of the classification approach. The flow of the overall implementation of the classification approach.

Figure 8. Preparing data for classification.

DownLoad: Full-Size Img PowerPoint

Figure 9. Fitting data for classification.

DownLoad: Full-Size Img PowerPoint

For forecasting the occurrence of rainfall, we have implemented four classification models as follows:

XGBoost Classifier: XGBoost ^[31] stands for eXtreme Gradient Boosting which is a fast and effective boosting algorithm based on gradient boosted decision tree algorithm. XGBoost uses a finer regularization technique, Shrinkage, and Subsampling column to prevent over-submerging, and this is one of the differences in gradient development. The XGBoost classifier takes a 2-dimensional array of training features and target variables as input for training the classification model.

Random Forest Classifier: Random forest or random decision forest ^[32] is an integrated learning method for classification, regression, and other activities that work by building a pile of decision trees during training. It basically creates a set of decision trees from a randomly selected small set of the training set and then collects votes from different decision trees to determine the final forecast. For classification tasks, random forest clearing is a class selected by many trees. Random Forest classifier also takes a 2-dimensional array of training features and target variables as input for training the classification model.

Kernel SVM Classifier: Support Vector Machines ^[33] are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. The main motivation of SVM is to create the best decision-making limit that can divide two or more classes so that we can accurately place data points in the correct class for which various kernels are used. We have chosen the Gaussian Radial Basis Function (RBF) as the kernel for training our SVM model as rainfall data is non-linear data. Equation (2) represents the gaussian radial basis function used by the support vector machine.

F (x, x_{j}) = e x p (- y * | | x - x_{j} | |^{2}

$\begin{equation} F(x, x_j) = exp(-y * ||x - x_j||^2 \end{equation}$

(2)

LSTM Classifier: An LSTM or Long-Short-Term-Memory classifier ^[34] is an artificial recurrent neural network that has both feedforward communication and feedback, and is often used to classify and make predictions over time-series data. For training the LSTM classifiers a different data format needs to be supplied in which the data must first be converted to a 3-D array according to the provided format: (number of samples, time steps, number of features). To prevent overfitting and visualize the training progress, callbacks are passed as parameters while training the prediction model. In our approach we have used two callbacks that are:

● ReduceLRonPlateau: It reduces the learning rate by 'factor' times passed as an argument if the metric has stopped improving for the 'patience' number of epochs. Reducing the learning rate often benefits the model.

● EarlyStopping: This will stop training if the monitored metric has stopped improving for the 'patience' number of epochs.

Figure 10(a), (b) show the layout of the LSTM neural network trained and unrolled RNN for a timestamp of 15 days respectively.

Figure 10. The layout of the LSTM neural network trained and unrolled RNN for a timestamp. (a) Sequential LSTM model, and (b) Unrolled RNN for the timestamp of 15 days.

DownLoad: Full-Size Img PowerPoint

To predict the occurrence of rainfall tomorrow, the designed algorithm is then shown in Algorithm 1.

Algorithm 1: Algorithm for Classification

Input: Rainfall Forecasting Dataset
I = ['MinTemp', 'MaxTemp', 'Rainfall', 'WindGustDir', 'WindGustSpeed', 'WindDir9am', 'WindDir3pm', 'WindSpeed9am', 'WindSpeed3pm', 'Humidity9am, Humidity3pm', 'Pressure9am', 'Pressure3pm', 'Temp9am', 'Temp3pm', 'RainToday', 'RainTomorrow']
Output: Yes, No
1 Preprocess the input data and divide it into features and targets.
2 Balance output classes using SMOTE.
3 Scale the data using Standard Scaler.
4 Define classification model.
5 Train the classifier according to the defined parameters.

4.3. Projecting the amount of rainfall in millimeters

Classification is one of the steps in precipitation forecasting. It tells you whether it will be a sunny day or whether you will need an umbrella throughout the day, since the forecast only says that it will rain at any time. However, another important aspect is that predicting the amount of rainfall based on features of the flow is a good prediction because it helps us make decisions such as whether to leave the station or stay home because it will rain heavily, whether to hold back the water stored in the reservoir or release some water from the reservoir because heavy rain is predicted in the watershed, and more. Therefore, this part deals with different regression techniques that are used to forecast the amount of rainfall in millimeters.

The regression task is also divided into three parts:

1) Preparing data for the regression model.

2) Training a regression model.

3) Evaluating the regression model.

For projecting the amount of rainfall, we have implemented various regression algorithms including:

● Multiple Linear Regression

● XGBoostRegressor

● Polynomial Regression

● Random Forest Regression

● LSTM-based Deep Learning Model

Multiple Linear Regression: Linear Regression ^[35] is a supervised learning to perform a regression task. This regression method detects the linear relationship between the input features and the target variable.

\hat{y} = b_{0} + b_{1} x_{1} + b_{2} x_{2} + . . . + b_{k} x_{k}

$\begin{equation} \hat{y} = b_0 + b_1 x_1 + b_2 x_2 + ... + b_k x_k \end{equation}$

(3)

Equation (3) is the statistical equation used for prediction by the Multiple Linear regression algorithm. In order to achieve the best fit line the model aims to predict ŷ so that the difference in error between the predicted value and the real value is as small as shown in Eq (4).

m i n i n m i z e \frac{1}{n} \sum_{i = 1}^{n} (p r e d_{i} - y_{i})^{2}

$\begin{equation} mininmize \frac{1}{n}\sum\limits_{i = 1}^n(pred_i - y_i)^2 \end{equation}$

(4)

XGBoostRegressor: XGBoost ^[36] stands for eXtreme Gradient Boosting which is a fast and effective boosting algorithm based on gradient boosted decision tree algorithm. XGBoost's objective function consists of a loss function and a regularization term. It tells us about the difference between real values and predicted values, i.e, how far the model results are from the real values. We used reg: linear as the XGBoost loss functions to perform the regression task.

Polynomial Regression: Polynomial regression ^[37] is a special case of linear regression in which the correlation between the independent variable x and the target variable y is modeled as an nth polynomial degree of x. This regression technique is used to identify a curvilinear relationship between independent and dependent variables.

\hat{y} = b_{0} + b_{1} x_{1} + b_{2} x_{1}^{2} + b_{3} x_{1}^{3} + . . . + b_{n} x_{1}^{n}

$\begin{equation} \hat{y} = b_0 + b_1 x_1 + b_2 x_1^2 + b_3 x_1^3 + ... + b_n x_1^n \end{equation}$

(5)

Equation (5) represents the statistical equation used for prediction by the Polynomial regression algorithm. In order to achieve the best fit line, the model aims to predict ŷ so that the difference in error between the predicted value and the real value is as small as shown in Eq (4).

Random Forest Regression: Random Forest or random decision forest ^[38] is an integrated learning method for classification, regression and other activities that work by building a pile of decision trees during training. Random Forest has many decision trees as base learning models. The end results of the random forest is the mean of all the results of decision trees.

LSTM based Deep Learning Model: An LSTM or Long-Short-Term-Memory classifier ^[39] is an artificial recurrent neural network that has both feedforward communication and feedback. LSTM for regression is typically a time series problem. For training the LSTM regression model data must first be converted to 3-D array according to the provided format: (number of samples, time steps, number of features). To prevent overfitting and visualize ^[40] the training progress, callbacks are passed as parameters while training the prediction model. In our approach we have used two callback that are:

● ReduceLRonPlateau: It reduces learning rate by "factor" times passed as argument if metric has stopped improving for "patience" number of epochs. Reducing learning rate oftens benefits the model.

● EarlyStopping: This will stop training if the monitored metric has stopped improving for "patience" number of epochs.

Algorithm 2 is used for forecasting the amount of precipitation, as is given here.

Algorithm 2: Algorithm for Regression

Input: Rainfall Amount Prediction Dataset
I = ['MinTemp', 'MaxTemp', 'Rainfall', 'WindGustDir', 'WindGustSpeed', 'WindDir9am', 'WindDir3pm', 'WindSpeed9am', 'WindSpeed3pm', 'Humidity9am, Humidity3pm', 'Pressure9am', 'Pressure3pm', 'Temp9am', 'Temp3pm', 'RainToday']
Output: Amount of precipitation in millimeters
1 Preprocess the input data and divide into features and target.
2 Scale the data using Standard Scaler.
3 Scale the data using Standard Scaler.
4 Define regression model.
5 Train regressor according to defined parameters.

5. Evaluation results

There are numerous evaluation metrics that can be used for measuring model performance. In our paper we have evaluated our machine learning and deep learning models on confusion matrix, accuracy, precision, recall and F1-score for classification models and mean absolute error, mean squared error and r2 score for regression models.

● For Classification

Confusion Matrix: Confusion matrix yields the output of a classification model in a matrix format. The matrix is defined as shown in Table 4.

Table 4. Confusion matrix.

		Predicted Values
		Positive	Negative
Actual Values	Positive	TP	FN
Actual Values	Negative	FP	TN

| Show Table

DownLoad: CSV

A list of evaluation metrics used for evaluating trained classifiers is stated in Eqs (6)–(9).

A c c u r a c y = \frac{N o . o f c o r r e c t p r e d i c t i o n s}{T o t a l n o . o f p r e d i c t i o n s}

$\begin{equation} Accuracy = \frac{No. of correct predictions}{Total no. of predictions} \end{equation}$

(6)

P r e c i s i o n = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e P o s i t i v e}

$\begin{equation} Precision = \frac{True Positive}{True Positive + False Positive} \end{equation}$

(7)

R e c a l l = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e N e g a t i v e}

$\begin{equation} Recall = \frac{True Positive}{True Positive + False Negative} \end{equation}$

(8)

F 1 = \frac{2 * (P r e c i s i o n * R e c a l l)}{(P r e c i s i o n + R e c a l l)}

$\begin{equation} F1 = \frac{2 * (Precision * Recall)}{(Precision + Recall)} \end{equation}$

(9)

XGBoost Classifier: Figure 11 and Table 5 represent the confusion matrix and the classification report for the XGBoost classifier. The classification report shows that the precision, recall, and f1-score for both the classes, i.e., rain and no rain are 96, 89, 92%, and 89, 96, 93% respectively.

Figure 11. Confusion matrix for XGBoost Classifier.

DownLoad: Full-Size Img PowerPoint

Table 5. Classification report for XGBoost Classifier.

	Precision	Recall	F1-Score	Support
0	89%	96%	93%	18238
1	96%	89%	92%	18714
Accuracy			92%	36952
Macro Avg	93%	92%	92%	36952
Weighted Avg	93%	92%	92%	36952

| Show Table

DownLoad: CSV

Kernel SVM: Figure 12 and Table 6 represent the confusion matrix and the classification report for the Support Vector Machine Classifier with a radial basis kernel function. The classification report shows that the precision, recall, and f1-score for both the classes, i.e., rain and no rain are 81, 82, 92%, and 82, 81, 81% respectively.

Figure 12. Confusion matrix for SVM Classifier.

DownLoad: Full-Size Img PowerPoint

Table 6. Classification report for SVM Classifier.

	Precision	Recall	F1-Score	Support
0	82%	81%	81%	18467
1	81%	82%	82%	18485
Accuracy			81%	36952
Macro Avg	81%	81%	81%	36952
Weighted Avg	81%	81%	81%	36952

| Show Table

DownLoad: CSV

LSTM Classifier: Figure 13 and Table 7 represent the confusion matrix and the classification report for the LSTM classifier. The classification report shows that the precision, recall, and f1-score for both the classes, i.e., rain and no rain are 86, 87, 86%, and 87, 85, 86% respectively.

Figure 13. Confusion matrix for LSTM Classifier.

DownLoad: Full-Size Img PowerPoint

Table 7. Classification report for LSTM Classifier.

	Precision	Recall	F1-Score	Support
0	87%	85%	86%	18531
1	86%	87%	86%	18531
Accuracy			86%	37062
Macro Avg	86%	86%	86%	37062
Weighted Avg	86%	86%	86%	37062

| Show Table

DownLoad: CSV

Random Forest Classifier: Figure 14 and Table 8 represent the confusion matrix and the classification report for the Random Forest classifier. The classification report shows that the precision, recall, and f1-score for both the classes, i.e., rain and no rain are 92, 91, 91%, and 91, 92, 91% respectively.

Figure 14. Confusion matrix for Random Forest Classifier.

DownLoad: Full-Size Img PowerPoint

Table 8. Classification report for Random Forest Classifier.

	Precision	Recall	F1-Score	Support
0	91%	92%	91%	18467
1	92%	91%	91%	18485
Accuracy			91%	36952
Macro Avg	91%	91%	91%	36952
Weighted Avg	91%	91%	91%	36952

| Show Table

DownLoad: CSV

Table 9 and Figure 15 represent the comparison of the evaluation results of the employed classification models. It is visible that the XGBoost classifier surpasses all the other classifiers with an accuracy (92.2%), precision (95.6%), and F1-Score (91.9%). However, Random Forest Classifier provided the best recall (91.2%) over the other classifiers. On the other hand, Kernel SVM with Radial Basis Function performed the worst among the four classifiers with an accuracy (81.4%), precision (80.9%), recall (82.1%), and F1-Score (81.5%), respectively.

Table 9. Evaluation results for Classification.

Approach	Accuracy	Precision	Recall	F1-Score
XGBoost Classifier	92.2%	95.61%	88.4%	91.87%
Random Forest Classifier	91.3%	91.42%	91.16%	91.29%
LSTM Classifier	85.84%	85.81%	85.88%	85.85%
Kernel SVM Classifier (rbf kernel)	81.44%	80.86%	82.06%	81.45%

| Show Table

DownLoad: CSV

Figure 15. Comparing Evaluation Results for Classification.

DownLoad: Full-Size Img PowerPoint

From Table 10, we can confirm that our classification method is significantly superior to the various modern methods that use the Australian Kaggle rain dataset in terms of accuracy.

Table 10. State-of-the-art Results.

State-of-the-Art Approach	Best Accuracy
Oswal (2019) ^[41]	84%
He (2021) ^[42]	82%

| Show Table

DownLoad: CSV

After undergoing several data processing techniques, a cleaned dataset with approximately 110 thousand rows is utilized for the classification task. Upon analyzing the class imbalance in the "RainTomorrow" feature, it was found to be highly skewed towards the "No" rainfall class, with a 9:1 ratio of "No" to "Yes" rainfall values. Models trained on this highly skewed data produced precision and accuracy values between 0.80 to 0.85. To address this imbalance, we used the SMOTE (Synthetic Minority Oversampling Technique) to balance both classes. This increased the data from 110 k rows to 183 k rows. The data was then divided into training and testing sets accordingly. The precision was improved to 95% and accuracy increased to 92% after using a balanced data set. This improvement in accuracy is attributed to the use of optimized data preprocessing and cleaning techniques, feature scaling techniques, data normalization techniques, training parameters, and train-test split ratios.

● For Regression

The evaluation metrics employed for evaluating the trained regression models are Mean Absolute Error (MAE), Mean Squared Error (MSE), and R2 Score (R $^2$ ) as stated in Eqs (10)–(12).

M A E = (\frac{1}{n}) \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

$\begin{equation} MAE = (\frac{1}{n})\sum\limits_{i = 1}^{n}| y_{i} - \hat{y}_{i} | \end{equation}$

(10)

M S E = (\frac{1}{n}) \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

$\begin{equation} MSE = (\frac{1}{n})\sum\limits_{i = 1}^{n}(y_i-\hat{y}_i)^2 \end{equation}$

(11)

R^{2} = 1 - \frac{S S_{R e s}}{S S_{T O T}}

$\begin{equation} R^2 = 1 - \frac{SS_{Res}}{SS_{TOT}} \end{equation}$

(12)

Table 11 and Figure 16(a), (b) present the evaluation results of Multiple Linear regression, XGBoostRegressor, Polynomial regression, Random Forest regression, and LSTM-based deep learning model.

Table 11. Evaluation Results for Regression.

Approach	MAE	MSE	R2 Score
Multiple Linear Regression	0.124	0.039	0.754
XGBoost Regression	0.121	0.04	0.737
Polynomial Regression	0.120	0.039	0.746
Random Forest Regression	0.117	0.036	0.760
LSTM based deep learning model	0.104	19.83	0.70

| Show Table

DownLoad: CSV

Figure 16. Comparing (a) MAE, R2 Score (b) MSE for Regression.

DownLoad: Full-Size Img PowerPoint

Here, it is observed that the Random Forest regressor outperformed all the other regression models with a mean absolute error of 0.117, mean squared error of 0.036, and R2 score of 0.76. On the other hand, the LSTM based deep learning model performed the worst among the five regression models with a mean absolute error of 0.104, mean squared error of 19.83, and R2 score of 0.70.

● Novelty and discussions: Data processing is a critical aspect of building a machine learning or deep learning model. Instead of filling missing values with the mean or mode of the entire dataset, the proposed solution uses seasonal and location-based data filling to fill numeric values with the mean and categorical values with the mode. LSTM-based models are often considered to be the best for modeling relationships in time series data. However, in the proposed method, the ensemble learning-based random forest model outperforms the LSTM model in both the classification and regression tasks. Random forest leverages class votes from each decision tree it grows, making it less susceptible to the impact of an inconsistent dataset and less prone to overfitting. In contrast, neural network models require more consistent data to make accurate predictions and may not perform well with inconsistent datasets.

● Limitations: Data collection is a major obstacle to accurate rainfall forecasting. Real-time weather data is hard to obtain and must be gathered from multiple meteorological stations, resulting in inconsistent and abnormal data due to incompatible data types and measurements. Therefore, we dropped the "Evaporation", "Sunshine", "Cloud9am", and "Cloud3pm" columns while handling NaN values as they each had 50% NaN values. Although these features can have a strong correlation with the rainfall value.

6. Conclusions and future works

In this work, we implemented different machine learning and deep learning models for predicting the occurrence of rainfall the next day and for predicting the amount of rainfall in millimeters. We used the Australian rainfall dataset in this work. The dataset contains weather data from 49 locations on the Australian continent. In this work, we managed to obtain more accurate results than various state-of-the-art approaches. We achieved an accuracy of 92.2%, precision of 95.6%, F1 score of 91.9%, and recall of 91.1% for next day rainfall prediction. For the prediction of the amount of precipitation in millimeters, we obtained a mean absolute error of 0.117, a mean square error of 0.036, and an R2 value of 0.76. To obtain the above results, we applied several data preprocessing techniques, such as. analyzing null values, populating null values with seasonal and location-specific values, removing outliers using the interquartile range approach, selecting features by analyzing the correlation matrix, converting categorical values to numerical values to use the data for training the predictive model, balancing the class of target variables for the classification task, and normalizing the data using standard scalars for the regression task. We also compared different statistical machine learning and deep learning models for both the classification and regression tasks. This work uses publicly available datasets for training classification and regression models. Satellite and radar data can be used for training models and predicting rainfall in real time.

In the future, further robustness can be achieved with the use of more recent and accurate data collected from meteorological departments. Incorporating additional features, such as "time of rainfall", "time of strongest wind gusts", "relative humidity at two points of time in a day" and "atmospheric pressure at sea level", could greatly enhance the model. These additional features are highly correlated with the "RainTomorrow" and "Rainfall" features. If the proposed model is trained with more features, it could lead to an increase in model performance. We would also like to work on transfer learning models to get better results.

Acknowledgments

This work is partially supported by Western Norway University of Applied Sciences, Bergen, Norway.

Conflict of interest

The authors declare there is no conflict of interest.

Acknowledgments

We would like to thank all participants in our study. This research received no external funding.

Authors' contribution

Conceptualization, Petros Galanis, Aglaia Katsiroumpa; methodology, Petros Galanis, Aglaia Katsiroumpa, Ioannis Moisoglou, Olympia Konstantakopoulou; software, Petros Galanis, Aglaia Katsiroumpa, Olympia Konstantakopoulou; validation, Aglaia Katsiroumpa, Ioannis Moisoglou, Olympia Konstantakopoulou; formal analysis, Petros Galanis, Aglaia Katsiroumpa, Olympia Konstantakopoulou; resources; Petros Galanis; data curation, Petros Galanis, Aglaia Katsiroumpa; writing-original draft preparation, Petros Galanis, Aglaia Katsiroumpa, Ioannis Moisoglou, Olympia Konstantakopoulou; writing-review and editing, Petros Galanis, Aglaia Katsiroumpa, Ioannis Moisoglou, Olympia Konstantakopoulou; supervision, Petros Galanis.

Conflict of interest

Petros Galanis is an editorial board member for AIMS Public Health, and he's also guest editor of AIMS Public Health Special Issue, and he was not involved in the editorial review or the decision to publish this article. All authors declare that there are no competing interests.

References

[1]	StatistaStatista, Social Media & User-Generated Content, 2024 (2024). Available from: https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/
[2]	Meng SQ, Cheng JL, Li YY, et al. (2022) Global prevalence of digital addiction in general population: A systematic review and meta-analysis. Clin Psychol Rev 92: 102128. https://doi.org/10.1016/j.cpr.2022.102128
[3]	Cheng C, Lau Y, Chan L, et al. (2021) Prevalence of social media addiction across 32 nations: Meta-analysis with subgroup analysis of classification schemes and cultural values. Addict Behav 117: 106845. https://doi.org/10.1016/j.addbeh.2021.106845
[4]	Arrivillaga C, Rey L, Extremera N (2022) A mediated path from emotional intelligence to problematic social media use in adolescents: The serial mediation of perceived stress and depressive symptoms. Addict Behav 124: 107095. https://doi.org/10.1016/j.addbeh.2021.107095
[5]	Bányai F, Zsila Á, Király O, et al. (2017) Problematic Social Media Use: Results from a Large-Scale Nationally Representative Adolescent Sample. PLoS ONE 12: e0169839. https://doi.org/10.1371/journal.pone.0169839
[6]	Sindermann C, Elhai JD, Montag C (2020) Predicting tendencies towards the disordered use of Facebook's social media platforms: On the role of personality, impulsivity, and social anxiety. Psychiatry Res 285: 112793. https://doi.org/10.1016/j.psychres.2020.112793
[7]	Keles B, McCrae N, Grealish A (2020) A systematic review: The influence of social media on depression, anxiety and psychological distress in adolescents. Int J Adolesc Youth 25: 79-93. https://doi.org/10.1080/02673843.2019.1590851
[8]	Kuss D, Griffiths M, Karila L, et al. (2014) Internet Addiction: A Systematic Review of Epidemiological Research for the Last Decade. CPD 20: 4026-4052. https://doi.org/10.2174/13816128113199990617
[9]	Xanidis N, Brignell CM (2016) The association between the use of social network sites, sleep quality and cognitive function during the day. Comput Human Behav 55: 121-126. https://doi.org/10.1016/j.chb.2015.09.004
[10]	Huang C (2022) A meta-analysis of the problematic social media use and mental health. Int J Soc Psychiatry 68: 12-33. https://doi.org/10.1177/0020764020978434
[11]	Shannon H, Bush K, Villeneuve PJ, et al. (2022) Problematic social media use in adolescents and young adults: Systematic review and meta-analysis. JMIR Ment Health 9: e33450. https://doi.org/10.2196/33450
[12]	Hong Y, Rong X, Liu W (2024) Construction of influencing factor segmentation and intelligent prediction model of college students' cell phone addiction model based on machine learning algorithm. Heliyon 10: e29245. https://doi.org/10.1016/j.heliyon.2024.e29245
[13]	DatareportalDatareportal, Global social media statistics, 2024 (2024). Available from: https://datareportal.com/social-media-users
[14]	HumbleGreeks & social media research: An up-to-date Whitepaper for the Greek market, 2024 (2024). Available from: https://humble.gr/blog/insights/humble-research/
[15]	Montag C, Yang H, Elhai JD (2021) On the psychology of TikTok use: A first glimpse from empirical findings. Front Public Health 9: 641673. https://doi.org/10.3389/fpubh.2021.641673
[16]	Lodice R, Papapicco C (2021) To be a TikToker in COVID-19 era: An experience of social influence. Online J Commun Medi 11: e202103. https://doi.org/10.30935/ojcmt/9615
[17]	Smith T, Short A (2022) Needs affordance as a key factor in likelihood of problematic social media use: Validation, latent Profile analysis and comparison of TikTok and Facebook problematic use measures. Addict Behav 129: 107259. https://doi.org/10.1016/j.addbeh.2022.107259
[18]	Casale S, Rugai L, Fioravanti G (2018) Exploring the role of positive metacognitions in explaining the association between the fear of missing out and social media addiction. Addict Behav 85: 83-87. https://doi.org/10.1016/j.addbeh.2018.05.020
[19]	Tarafdar M, Maier C, Laumer S, et al. (2020) Explaining the link between technostress and technology addiction for social networking sites: A study of distraction as a coping behavior. Inf Syst J 30: 96-124. https://doi.org/10.1111/isj.12253
[20]	Iram, Aggarwal H (2020) Time series analysis of pubg and tiktok applications using sentiments obtained from social media-twitter. Adv Math, Sci J 9: 4047-4057. https://doi.org/10.37418/amsj.9.6.86
[21]	Zhang X, Wu Y, Liu S (2019) Exploring short-form video application addiction: Socio-technical and attachment perspectives. Telemat Inform 42: 101243. https://doi.org/10.1016/j.tele.2019.101243
[22]	Lu L, Liu M, Ge B, et al. (2022) Adolescent addiction to short video applications in the mobile internet era. Front Psychol 13: 893599. https://doi.org/10.3389/fpsyg.2022.893599
[23]	Qin Y, Musetti A, Omar B (2023) Flow experience is a key factor in the likelihood of adolescents' problematic TikTok use: The moderating role of active parental mediation. IJERPH 20: 2089. https://doi.org/10.3390/ijerph20032089
[24]	Montag C, Lachmann B, Herrlich M, et al. (2019) Addictive features of social media/messenger platforms and freemium games against the background of psychological and economic theories. IJERPH 16: 2612. https://doi.org/10.3390/ijerph16142612
[25]	Burhan R, Moradzadeh J (2020) Neurotransmitter Dopamine (DA) and its role in the development of social media addiction. J Neurol Neurophysiol 11: 1-2.
[26]	Su C, Zhou H, Gong L, et al. (2021) Viewing personalized video clips recommended by TikTok activates default mode network and ventral tegmental area. NeuroImage 237: 118136. https://doi.org/10.1016/j.neuroimage.2021.118136
[27]	Varona MN, Muela A, Machimbarrena JM (2022) Problematic use or addiction? A scoping review on conceptual and operational definitions of negative social networking sites use in adolescents. Addict Behav 134: 107400. https://doi.org/10.1016/j.addbeh.2022.107400
[28]	Andreassen CS, Torsheim T, Brunborg GS, et al. (2012) Development of a facebook addiction scale. Psychol Rep 110: 501-517. https://doi.org/10.2466/02.09.18.PR0.110.2.501-517
[29]	Andreassen CS, Billieux J, Griffiths MD, et al. (2016) The relationship between addictive use of social media and video games and symptoms of psychiatric disorders: A large-scale cross-sectional study. Psychol Addict Behav 30: 252-262. https://doi.org/10.1037/adb0000160
[30]	Van Den Eijnden RJJM, Lemmens JS, Valkenburg PM (2016) The social media disorder scale. Comput Human Behav 61: 478-487. https://doi.org/10.1016/j.chb.2016.03.038
[31]	Elphinston RA, Noller P (2011) Time to face it! Facebook intrusion and the implications for romantic jealousy and relationship satisfaction. Cyberpsychol Behav Soc Netw 14: 631-635. https://doi.org/10.1089/cyber.2010.0318
[32]	Caplan SE (2010) Theory and measurement of generalized problematic Internet use: A two-step approach. Comput Human Behav 26: 1089-1097. https://doi.org/10.1016/j.chb.2010.03.012
[33]	Young KS (1998) Internet addiction: The emergence of a new clinical disorder. Cyberpsychol Behav 1: 237-244. https://doi.org/10.1089/cpb.1998.1.237
[34]	Dadiotis A, Bacopoulou F, Kokka I, et al. (2021) Validation of the Greek version of the Bergen social media addiction scale in undergraduate students. EMBnet J 26: e975. https://doi.org/10.14806/ej.26.1.975
[35]	Floros G, Siomos K (2012) Patterns of choices on video game genres and internet addiction. Cyberpsychol Behav Soc Netw 15: 417-424. https://doi.org/10.1089/cyber.2012.0064
[36]	Kokka I, Mourikis I, Michou M, et al. (2021) Validation of the Greek version of social media disorder scale. GeNeDis 2020. Cham: Springer International Publishing 107-116. https://doi.org/10.1007/978-3-030-78775-2_13
[37]	Zhu J, Ma Y, Xia G, et al. (2024) Self-perception evolution among university student TikTok users: evidence from China. Front Psychol 14: 1217014. https://doi.org/10.3389/fpsyg.2023.1217014
[38]	Alhabash S, Smischney TM, Suneja A, et al. (2024) So similar, yet so different: How motivations to use Facebook, Instagram, Twitter, and TikTok predict problematic use and use continuance intentions. Sage Open 14: 21582440241255426. https://doi.org/10.1177/21582440241255426
[39]	Hendrikse C, Limniou M (2024) The use of Instagram and TikTok in relation to problematic use and well-being. J Technol Behav Sci 9: 1-12. https://doi.org/10.1007/s41347-024-00399-6
[40]	Yang Y, Adnan H, Sarmiti N (2023) The relationship between anxiety and TikTok addiction among university students in China: Mediated by escapism and use intensity. Int J Media Inf Lit 8. https://doi.org/10.13187/ijmil.2023.2.458
[41]	Rogowska AM, Cincio A (2024) Procrastination mediates the relationship between problematic TikTok Use and depression among young adults. JCM 13: 1247. https://doi.org/10.3390/jcm13051247
[42]	Pontes HM, Schivinski B, Sindermann C, et al. (2021) Measurement and conceptualization of gaming disorder according to the World Health Organization framework: The development of the gaming disorder test. Int J Ment Health Addiction 19: 508-528. https://doi.org/10.1007/s11469-019-00088-z
[43]	Montag C, Markett S (2024) Depressive inclinations mediate the association between personality (neuroticism/conscientiousness) and TikTok Use Disorder tendencies. BMC Psychol 12: 81. https://doi.org/10.1186/s40359-024-01541-y
[44]	McCoach D, Gable R, Madura J (2013) Review of the steps for designing an instrument. Instrument development in the affective domain. New York: Springer 277-284. https://doi.org/10.1007/978-1-4614-7135-6_8
[45]	Bekalu MA, Sato T, Viswanath K (2023) Conceptualizing and measuring social media use in health and well-being studies: Systematic review. J Med Internet Res 25: e43191. https://doi.org/10.2196/43191
[46]	Darvesh N, Radhakrishnan A, Lachance CC, et al. (2020) Exploring the prevalence of gaming disorder and internet gaming disorder: A rapid scoping review. Syst Rev 9: 68. https://doi.org/10.1186/s13643-020-01329-2
[47]	Pan YC, Chiu YC, Lin YH (2020) Systematic review and meta-analysis of epidemiology of internet addiction. Neurosci Biobehav Rev 118: 612-622. https://doi.org/10.1016/j.neubiorev.2020.08.013
[48]	Brown R (1993) Some contributions of the study of gambling to the study of other addictions. Gambling Behaviour and Problem Gambling. Reno: University of Nevada Press 241-272.
[49]	Griffiths M (1996) Nicotine, tobacco and addiction. Nature 384: 18. https://doi.org/10.1038/384018a0
[50]	Griffiths M (2005) A ‘components’ model of addiction within a biopsychosocial framework. J Subst Abuse 10: 191-197. https://doi.org/10.1080/14659890500114359
[51]	WHOInternational classification of diseases 11th revision, 2024 (2024). Available from: https://icd.who.int/en
[52]	Ayre C, Scally AJ (2014) Critical values for Lawshe's content validity ratio: Revisiting the original methods of calculation. Meas Eval Counsel Dev 47: 79-86. https://doi.org/10.1177/0748175613513808
[53]	Meadows K (2021) Cognitive interviewing methodologies. Clin Nurs Res 30: 375-379. https://doi.org/10.1177/10547738211014099
[54]	Yusoff MSB (2019) ABC of response process validation and face validity index calculation. EIMJ 11: 55-61. https://doi.org/10.21315/eimj2019.11.3.6
[55]	Costello AB, Osborne J (2005) Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Pract Assess Res Eval 10: 1-9.
[56]	DeVon HA, Block ME, Moyle-Wright P, et al. (2007) A psychometric toolbox for testing validity and reliability. J Nurs Scholarsh 39: 155-164. https://doi.org/10.1111/j.1547-5069.2007.00161.x
[57]	De Vaus D (2004) Surveys in social research. London: Routledge 180-200.
[58]	Yusoff MSB, Arifin WN, Hadie SNH (2021) ABC of questionnaire development and validation for survey research. EIMJ 13: 97-108. https://doi.org/10.21315/eimj2021.13.1.10
[59]	Hair J, Black W, Babin B, et al. (2017) Multivariate data analysis. New Jersey: Prentice Hall 45-55.
[60]	De Winter JCF, Dodou D, Wieringa PA (2009) Exploratory factor analysis with small sample sizes. Multivariate Behav Res 44: 147-181. https://doi.org/10.1080/00273170902794206
[61]	Klein R (2016) Principles and practice of structural equation modelling. New York: Guilford Press 188-210.
[62]	Bland JM, Altman DG (1997) Statistics notes: Cronbach's alpha. BMJ 314: 572-572. https://doi.org/10.1136/bmj.314.7080.572
[63]	Brown T (2015) Confirmatory factor analysis for applied research. New York: The Guilford Press 72-87.
[64]	Hu L, Bentler PM (1998) Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychol Methods 3: 424-453. https://doi.org/10.1037/1082-989X.3.4.424
[65]	Baumgartner H, Homburg C (1996) Applications of structural equation modeling in marketing and consumer research: A review. Int J Res Mark 13: 139-161. https://doi.org/10.1016/0167-8116(95)00038-0
[66]	Kroenke K, Spitzer RL, Williams JBW, et al. (2009) An ultra-brief screening scale for anxiety and depression: The PHQ-4. Psychosomatics 50: 613-621. https://doi.org/10.1176/appi.psy.50.6.613
[67]	Rammstedt B, John OP (2007) Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. J Res Pers 41: 203-212. https://doi.org/10.1016/j.jrp.2006.02.001
[68]	Copez-Lonzoy A, Vallejos-Flores M, Capa-Luque W, et al. (2023) Adaptation of the Bergen Social Media Addiction Scale (BSMAS) in Spanish. Acta Psychol (Amst) 241: 104072. https://doi.org/10.1016/j.actpsy.2023.104072
[69]	Brailovskaia J, Margraf J (2022) Addictive social media use during Covid-19 outbreak: Validation of the Bergen Social Media Addiction Scale (BSMAS) and investigation of protective factors in nine countries. Curr Psychol 1–19. https://doi.org/10.1007/s12144-022-03182-z
[70]	Shin NY (2022) Psychometric properties of the bergen social media addiction scale in Korean young adults. Psychiatry Investig 19: 356-361. https://doi.org/10.30773/pi.2021.0294
[71]	Žmavc M, Šorgo A, Gabrovec B, et al. (2022) The protective role of resilience in the development of social media addiction in tertiary students and psychometric properties of the Slovenian Bergen Social Media Addiction Scale (BSMAS). Int J Environ Res Public Health 19: 13178. https://doi.org/10.3390/ijerph192013178
[72]	Rouleau RD, Beauregard C, Beaudry V (2023) A rise in social media use in adolescents during the COVID-19 pandemic: the French validation of the Bergen Social Media Addiction Scale in a Canadian cohort. BMC Psychol 11: 92. https://doi.org/10.1186/s40359-023-01141-2
[73]	Cunningham S, Hudson CC, Harkness K (2021) Social media and depression symptoms: A meta-analysis. Res Child Adolesc Psychopathol 49: 241-253. https://doi.org/10.1007/s10802-020-00715-7
[74]	Hussain Z, Wegmann E, Yang H, et al. (2020) Social networks use disorder and associations with depression and anxiety symptoms: A systematic review of recent research in China. Front Psychol 11: 211. https://doi.org/10.3389/fpsyg.2020.00211
[75]	Caro-Fuentes S, Sanabria-Mazo JP (2024) A systematic review of the psychometric properties of the patient Health Questionnaire-4 in clinical and nonclinical populations. J Acad Consult Liaison Psychiatry 65: 178-194. https://doi.org/10.1016/j.jaclp.2023.11.685
[76]	Meidl V, Dallmann P, Leonhart R, et al. (2024) Validation of the patient health Questionnaire-4 for longitudinal mental health evaluation in elite Para athletes. PM R 16: 141-149. https://doi.org/10.1002/pmrj.13011
[77]	Rodríguez-Muñoz M de la F, Ruiz-Segovia N, Soto-Balbuena C, et al. (2020) The psychometric properties of the patient health Questionnaire-4 for pregnant women. Int J Environ Res Public Health 17: 7583. https://doi.org/10.3390/ijerph17207583
[78]	Tan YK, Siau CS, Ibrahim N, et al. (2024) Validation of the Malay version of the Patient Health Questionnaire-4 (PHQ-4) among Malaysian undergraduates. Asian J Psychiatr 99: 104134. https://doi.org/10.1016/j.ajp.2024.104134
[79]	Karekla M, Pilipenko N, Feldman J (2012) Patient health questionnaire: Greek language validation and subscale factor structure. Compr Psychiatry 53: 1217-1226. https://doi.org/10.1016/j.comppsych.2012.05.008
[80]	Correa T, Hinsley AW, De Zúñiga HG (2010) Who interacts on the Web?: The intersection of users' personality and social media use. Comput Human Behav 26: 247-253. https://doi.org/10.1016/j.chb.2009.09.003
[81]	Kuss DJ, Griffiths MD (2011) Online social networking and addiction—A review of the psychological literature. IJERPH 8: 3528-3552. https://doi.org/10.3390/ijerph8093528
[82]	Wilson K, Fornasier S, White KM (2010) Psychological predictors of young adults' use of social networking sites. Cyberpsychol Behav Soc Netw 13: 173-177. https://doi.org/10.1089/cyber.2009.0094
[83]	Kunnel John R, Xavier B, Waldmeier A, et al. (2019) Psychometric evaluation of the BFI-10 and the NEO-FFI-3 in Indian adolescents. Front Psychol 10: 1057. https://doi.org/10.3389/fpsyg.2019.01057
[84]	Costa Mastrascusa R, De Oliveira Fenili Antunes ML, De Albuquerque NS, et al. (2023) Evaluating the complete (44-item), short (20-item) and ultra-short (10-item) versions of the Big Five Inventory (BFI) in the Brazilian population. Sci Rep 13: 7372. https://doi.org/10.1038/s41598-023-34504-1
[85]	Balgiu BA (2018) The psychometric properties of the Big Five inventory-10 (BFI-10) including correlations with subjective and psychological well-being. GJPR 8: 61-69. https://doi.org/10.18844/gjpr.v8i2.3434
[86]	Soto CJ, John OP (2017) The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. J Pers Soc Psychol 113: 117-143. https://doi.org/10.1037/pspp0000096
[87]	World Medical Association.World medical association declaration of Helsinki: Ethical principles for medical research involving human subjects. JAMA (2013) 310: 2191. https://doi.org/10.1001/jama.2013.281053

publichealth-11-04-061-s001.pdf

This article has been cited by:

1.	Tumusiime Andrew Gahwera, Odongo Steven Eyobu, Mugume Isaac, Analysis of Machine Learning Algorithms for Prediction of Short-Term Rainfall Amounts Using Uganda’s Lake Victoria Basin Weather Dataset, 2024, 12, 2169-3536, 63361, 10.1109/ACCESS.2024.3396695
2.	Arti Jain, Rajeev Kumar Gupta, Mohit Kumar, 2024, chapter 16, 9798369323519, 324, 10.4018/979-8-3693-2351-9.ch016
3.	Babita Pathik, Rajeev Kumar Gupta, Nikhlesh Pathik, 2024, Chapter 4, 978-3-031-62216-8, 45, 10.1007/978-3-031-62217-5_4
4.	Nan Yao, Jinyin Ye, Shuai Wang, Shuai Yang, Yang Lu, Hongliang Zhang, Xiaoying Yang, Bias correction of the hourly satellite precipitation product using machine learning methods enhanced with high-resolution WRF meteorological simulations, 2024, 310, 01698095, 107637, 10.1016/j.atmosres.2024.107637
5.	Dhvanil Bhagat, Shrey Shah, Rajeev Kumar Gupta, 2024, Chapter 6, 978-3-031-62216-8, 63, 10.1007/978-3-031-62217-5_6
6.	Tumusiime Andrew Gahwera, Odongo Steven Eyobu, Mugume Isaac, 2024, Transfer Learning Approach for Rainfall Class Amount Prediction Using Uganda's Lake Victoria Basin Weather Dataset, 979-8-3503-9174-9, 101, 10.1109/IBDAP62940.2024.10689700
7.	Abhishek Thoke, Sakshi Rai, 2024, Exploring Faster R-CNN Algorithms for Object Detection, 979-8-3503-5293-1, 1, 10.1109/SSITCON62437.2024.10796389
8.	S. P. Siddique Ibrahim, N.Venkata Harika, N.Mohitha Reddy, K. Srija, P.Shiva Sahithi, A. Kohima, 2024, Application of Machine Learning and Data Mining Techniques for Accurate Cloud Burst Prediction, 979-8-3315-1809-7, 1117, 10.1109/ICICNIS64247.2024.10823388
9.	Hiresh Singh Sengar, Sakshi Rai, 2024, A Comparative Analysis of Different Machine Learning Approaches for Crop Yield Prediction, 979-8-3315-0496-0, 1, 10.1109/ICIICS63763.2024.10859455
10.	C. Vijayalakshmi, M. Pushpa, Optimizing non-linear autoregressive networks with Bird Sea Lion algorithms for effective rainfall forecasting, 2025, 18, 1865-0473, 10.1007/s12145-025-01768-2
11.	Tumusiime Andrew Gahwera, Odongo Steven Eyobu, Mugume Isaac, Samuel Kakuba, Dong Seog Han, Transfer Learning-Based Ensemble Approach for Rainfall Class Amount Prediction, 2025, 13, 2169-3536, 48318, 10.1109/ACCESS.2025.3551737

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Public Health

3.1 4.8

Metrics

Article views(1736) PDF downloads(174) Cited by(3)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(2) / Tables(4)

AIMS Public Health

The TikTok Addiction Scale: Development and validation

Related Papers:

Abstract

1. Introduction

2. Literature review

3. Data description

4. Proposed approach

4.1. Data pre-processing

4.1.1. Removing outliers

4.1.2. Addressing NaN values

4.1.3. Handling categorical variables

4.1.4. Class balancing

4.1.5. Data normalization

4.1.6. Dataset capacity

4.2. Forecasting occurrence of rainfall

4.3. Projecting the amount of rainfall in millimeters

5. Evaluation results

6. Conclusions and future works

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Public Health

The TikTok Addiction Scale: Development and validation

Related Papers:

Abstract

1. Introduction

2. Literature review

3. Data description

4. Proposed approach

4.1. Data pre-processing

4.1.1. Removing outliers

4.1.2. Addressing NaN values

4.1.3. Handling categorical variables

4.1.4. Class balancing

4.1.5. Data normalization

4.1.6. Dataset capacity

4.2. Forecasting occurrence of rainfall

4.3. Projecting the amount of rainfall in millimeters

5. Evaluation results

6. Conclusions and future works

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog