Research article

“Is a game really a reason for people to die?” Sentiment and thematic analysis of Twitter-based discourse on Indonesia soccer stampede

  • This study examined discourses related to an Indonesian soccer stadium stampede on 1st October 2022 using comments posted on Twitter. We conducted a lexicon-based sentiment analysis to identify the sentiments and emotions expressed in tweets and performed structural topic modeling to identify latent themes in the discourse. The majority of tweets (87.8%) expressed negative sentiments, while 8.2% and 4.0% of tweets expressed positive and neutral sentiments, respectively. The most common emotion expressed was fear (29.3%), followed by sadness and anger. Of the 19 themes identified, “Deaths and mortality” was the most prominent (15.1%), followed by “family impact”. The negative stampede discourse was related to public concerns such as “vigil” and “calls for bans and suspension,” while positive discourse focused more on the impact of the stampede. Public health institutions can leverage the volume and rapidity of social media to improve disaster prevention strategies.

    Citation: Otobo I. Ujah, Chukwuemeka E Ogbu, Russell S. Kirby. “Is a game really a reason for people to die?” Sentiment and thematic analysis of Twitter-based discourse on Indonesia soccer stampede[J]. AIMS Public Health, 2023, 10(4): 739-754. doi: 10.3934/publichealth.2023050

    Related Papers:

    [1] Marcello Mastrorilli, Raffaella Zucaro . Towards sustainable use of water in rainfed and irrigated cropping systems: review of some technical and policy issues. AIMS Agriculture and Food, 2016, 1(3): 294-314. doi: 10.3934/agrfood.2016.3.294
    [2] Monsuru Adekunle Salisu, Yusuf Opeyemi Oyebamiji, Omowunmi Kayode Ahmed, Noraziyah A Shamsudin, Yusoff Siti Fairuz, Oladosu Yusuff, Mohd Rafii Yusop, Zulkefly Sulaiman, Fatai Arolu . A systematic review of emerging trends in crop cultivation using soilless techniques for sustainable agriculture and food security in post-pandemic. AIMS Agriculture and Food, 2024, 9(2): 666-692. doi: 10.3934/agrfood.2024036
    [3] Macire Kante, Robert Oboko, Christopher Chepken . Factors affecting the use of ICTs on agricultural input information by farmers in developing countries. AIMS Agriculture and Food, 2016, 1(3): 315-329. doi: 10.3934/agrfood.2016.3.315
    [4] Romanus Osabohien, Oluwatoyin Matthew, Isaiah Olurinola, Busayo Aderounmu . Agricultural transformation, youth participation and food security in Nigeria. AIMS Agriculture and Food, 2020, 5(4): 911-919. doi: 10.3934/agrfood.2020.4.911
    [5] Yemane Asmelash Gebremariam, Joost Dessein, Beneberu Assefa Wondimagegnhu, Mark Breusers, Lutgart Lenaerts, Enyew Adgo, Steven Van Passel, Amare Sewnet Minale, Amaury Frankl . Listen to the radio and go on field trips: A study on farmers' attributes to opt for extension methods in Northwest Ethiopia. AIMS Agriculture and Food, 2024, 9(1): 3-29. doi: 10.3934/agrfood.2024002
    [6] Celina Gómez, Luigi Gennaro Izzo . Increasing efficiency of crop production with LEDs. AIMS Agriculture and Food, 2018, 3(2): 135-153. doi: 10.3934/agrfood.2018.2.135
    [7] Antonino Marvuglia, Tomás Navarrete Gutiérrez, Paul Baustert, Enrico Benetto . Implementation of Agent-Based Models to support Life Cycle Assessment: A review focusing on agriculture and land use. AIMS Agriculture and Food, 2018, 3(4): 535-560. doi: 10.3934/agrfood.2018.4.535
    [8] Mohammad M. Islam, Majed Alharthi, Rotana S. Alkadi, Rafiqul Islam, Abdul Kadar Muhammad Masum . Crop yield prediction through machine learning: A path towards sustainable agriculture and climate resilience in Saudi Arabia. AIMS Agriculture and Food, 2024, 9(4): 980-1003. doi: 10.3934/agrfood.2024053
    [9] Jan Willem Erisman, Nick van Eekeren, Jan de Wit, Chris Koopmans, Willemijn Cuijpers, Natasja Oerlemans, Ben J. Koks . Agriculture and biodiversity: a better balance benefits both. AIMS Agriculture and Food, 2016, 1(2): 157-174. doi: 10.3934/agrfood.2016.2.157
    [10] Emilio J. González-Sánchez, Amir Kassam, Gottlieb Basch, Bernhard Streit, Antonio Holgado-Cabrera, Paula Triviño-Tarradas . Conservation Agriculture and its contribution to the achievement of agri-environmental and economic challenges in Europe. AIMS Agriculture and Food, 2016, 1(4): 387-408. doi: 10.3934/agrfood.2016.4.387
  • This study examined discourses related to an Indonesian soccer stadium stampede on 1st October 2022 using comments posted on Twitter. We conducted a lexicon-based sentiment analysis to identify the sentiments and emotions expressed in tweets and performed structural topic modeling to identify latent themes in the discourse. The majority of tweets (87.8%) expressed negative sentiments, while 8.2% and 4.0% of tweets expressed positive and neutral sentiments, respectively. The most common emotion expressed was fear (29.3%), followed by sadness and anger. Of the 19 themes identified, “Deaths and mortality” was the most prominent (15.1%), followed by “family impact”. The negative stampede discourse was related to public concerns such as “vigil” and “calls for bans and suspension,” while positive discourse focused more on the impact of the stampede. Public health institutions can leverage the volume and rapidity of social media to improve disaster prevention strategies.



    According to the 2018 food-insecurity estimates (FIES), 9.2 percent of the world's population was exposed to hunger and 17.2 percent were exposed to limited access to food [1]. Meanwhile, the world's population is expected to increase from, currently, 7.7 billion to 9.7 billion in 2050 and eventually reach 11 billion in 2100 [2]. Therefore, boosting food production to meet the growing population's demand for food should be a concern.

    The conventional sampling methods in agriculture are labor intensive, cost inefficient, time consuming, prone to human error, destructive, and environmentally harmful [3,4]. Therefore, recent studies have explored precision agriculture as a substitute. Precision agriculture includes the application of remote sensing technologies, such as unmanned aerial vehicles (UAVs), and computer vision, such as machine learning algorithms (MLs), to collect and analyze aerial data, respectively, for the management of crop production [5]. UAVs are one of the effective remote sensing technologies used for data collection; UAVs are non-destructive, time efficient, not expensive, less laborious, and less prone to human error [6]. MLs are then used for UAVs' image object detection, patterns classification, and the regression of its linear and nonlinear interrelationships [7,8].

    The use of UAVs and MLs in smart agriculture has enabled farmers to monitor crops health, detect their invasive pests, and predict their yield and biomass [9,10]. UAVs can capture high-resolution spectral images of crops and their surrounding fields, providing valuable information about crops' biochemical and physiological properties, such as the amount and vigor of vegetation, biomass and yield growth, and vegetation health. These properties have different spectral signatures over time and under different biotic and abiotic conditions. MLs can process these images to detect the relation between the change in spectral signature and its corresponding change in crops' biochemical and physiological properties, allowing them to detect crop health, measure crop biomass and yield growth, and detect the presence of pests. By leveraging UAVs and MLs, farmers can make cost effective decisions that increase the contribution to sustainable agriculture practices. Thus, the integration of UAVs and MLs in agriculture is expected to keep growing in the coming years [11].

    In precision agriculture, various popular algorithm types and families are being used [12,13,14,15]. For example, support vector machines (SVM) [16,17], K-means [18], multiple linear regression (MLR) [19,20], and stepwise multiple regression (SMR) [21]. Advanced algorithms such as random forest (RF) [17,22,23,24], random forest regression (RFR) [25,26], neural network (NN) [16,17], and convolutional neural network (CNN) [27] are also used. These advanced algorithms are built on basic ones with enhanced functionality ideas [28]. Overall, the conventional and advanced algorithms extract, interpret, and categorize the data patterns following different learning methods such as supervised, unsupervised, semi-supervised, reinforcement, multi-task learning, instance-based learning, and neural networks [29].

    Despite of the widespread use of MLs in precision agriculture, these algorithm-generated models are prone to uncertainty, which disserves the generalisability of the modelling results. The uncertainty may come from technical sources such as model training and evaluation methods, learning methods, algorithm sensitivity to multilinearity, model depth (e.g., for CNNs), the fluctuation of statistical evaluation metrics, algorithm parameterization, and technical restrictions [30,31,32,33,34]. The uncertainty may also be induced by the dataset dimension, data balance and cleaning, data augmentation methods, huge difference between the training and testing sets size, change in data type, low dependency between the input and target variables, highly noised data, biased data, the incompleteness of training datasets, image labelling and annotation, and features selection [12,35,36,37,38,39,40,41,42,43]. In addition, the uncertainty may also be due to farm management, cropping systems, and field conditions, such as the field location and area, soil properties variation, farmland topology, farmland fragmentation, land preparation, tillage methods, the influence of human activities on soil compaction, crop-surrounding vegetation, cultivar variation, fertilization rates, fertilizer application dates, replication, plant density, intercropping, plant phyllotaxis, and plant structure [27,34,36,41,44,45,46,47,48,49,50,51,52,53,54,55,56]. Additionally, uncertainty may also relate to sensor specifications, ground data collection methods, aerial data acquisition conditions, imagery resolution, image registration, alignment, and stitching [35,41,55,57,58,59]. Generally, the performance of MLs depends on the quality of the data. The sources of uncertainty increase the occurrence of noise, bias, outliers and confounding factors, as well as the redundancy of visual information and high dimensionality in data samples, which lowers data quality and forces the MLs to generate weak models.

    Overall, the operational efficiency of algorithms depends on methodological, technical and data-type factors [12]. The non-consideration of these factors' impact lead to misleading, biased, and uncertain modeling [60,61,62,63] and classification results [34,64,65,66,67,68,69]. Optimal modeling allows the assessment of crop performance under field conditions in terms of health status, yield productivity, yield quality, and tolerance to biotic/abiotic stress. Additionally, it allows the end-user to select the optimal cultivars, adequate farm management, and cropping practices according to their needs. This study aims to review different uncertainty sources affecting the operational efficiency of MLs in regression and classification interventions examined among the crops of interest in UAV-based precision agriculture.

    The remainder of this study is arranged into four sections: 1) methods adopted to select study sources, study selection criteria, data extraction, study quality assessment, and strategy for data synthesis, 2) results of detailed included studies, detailed findings and data sensitivity analysis, 3) discussion of the study's findings, and 4) Section 4 for the conclusion.

    The review process follows the selection of scope-related studies, the data extraction, and then the interpretation and synthesis of results according to the review guide in [70] and the Cochrane handbook for systematic reviews of interventions, chapter on planning a Cochrane review [71].

    The Scopus database was used to search for literature because the database entails the most refereed journals belonging to major publishers such as Elsevier, Taylor & Francis, IEEE, Emerald and Springer [14]. For the selection of studies, the search was launched according to this string: ("phenotype*" OR "crop" OR "plant" AND "unmanned aerial" AND "deep learning" OR "machine learning" OR "computer vision"). The search was restricted to the years 2013-2020.

    The studies considered as scope related in this review were the already-published English journal articles that applied the UAVs for the field data collection and the MLs for the data analysis. Conference papers, reviews, book chapters, letters, books, notes, editorials, conference reviews, short surveys, in-press articles, and studies conducted in greenhouses or laboratories were excluded. Yet, the related studies underwent further screening of the abstract and text body to meet the study aim. The eligibility criteria were defined using this question: "Why are ML algorithms used for regression and classification susceptible to uncertain modelling in precision agriculture?".

    For the data extraction, the main items of the final included studies were extracted and then recorded in an Excel sheet file by one reviewer and three peer reviewers. No redundancy of studies was found as the search was conducted in one database. The main items extracted from included studies were the Population for crop traits of interest, Intervention for the regression and classification modelling, Comparators for the UAV-image features used for the modelling, and Outcomes for the results of the modelling (PICO) [72]. After the extraction of the main items, the outcomes related to sources increasing the modelling uncertainty, within regression and classification interventions, were prioritized for interpretation and synthesis, and the remaining were excluded.

    A total of 442 papers were found in the Scopus database responding to the above-defined search strings. Conference papers (i.e., 155), conference review (i.e., 10), review (i.e., 6), book chapters (i.e., 4), letter (i.e., 2) and note (i.e., 1) were eliminated committing to the exclusion criteria, 210 English articles remained. However, 3 articles in press were subtracted, resulting in 207 articles. From those, 131 were all open access that fit the inclusion criteria. After title and abstract screening, 27 articles were omitted to reach the final number of 109 filed articles suitable for results synthesis to meet the review's aim. Details are shown in Figure 1.

    Figure 1.  Review flowchart.

    104 studies reported potent sources that induce model uncertainty. Those sources were mainly related to: 1) technical settings of algorithms, 2) data quality, 3) data dimension, 4) features selection and ranking, and 5) field conditions, farm management, and cropping systems. The sources of uncertainty are explained in detail as follows.

    The performance of ML algorithms depends on the selection of their optimal hyperparameter values, or so-called algorithm parameterization [73]. Generally, algorithm design, training, testing, model learning strategies, and technical restrictions are of the same importance.

    The scientific literature shows that advanced MLs may outperform conventional MLs [42] because of their ability for extracting optimal features, saving time and reducing the requirements for expertise. A deep semantic segmentation based on pixel-wise classification outperformed a support vector machine (SVM), the latter is considered a conventional ML, to map plastic mulched farmland for high resolution images [34]. On the contrary, the SVM resulted in a comparable performance and better than that of random forest (RF) and deep learning algorithms such as convolutional neural network (CNN) for cabbage crop classification. The SVM is less affected by data sample size considering the proper combination of spatial, contextual, and spectral features [32]. Moreover, the SVM exhibited a comparable performance to the RF and outperformed the traditional multi-variable linear regression model (OLS) and backpropagation neural network (BPNN) for estimating maize aboveground biomass (AGB) using high-resolution imagery and plant height [74]. A stepwise multiple linear regression (SMLR) found to be susceptible to overfitting when compared to an RF, SVM, and extreme learning machine (ELM) for estimation of wheat AGB given a high number of variables as input [75]. Compared to an RF, a multiple linear regression (MLR) was highly sensitive to multilinearity for canopy nitrogen (N) prediction while an SVM was less sensitive [76]. Possibly, conventional MLs overfit with small data size, weaken with large data sizes and perform better with high quality data.

    The effect of different designs of deep convolutional neural networks' (DCNN) model depth, width, and filter size on network performance was investigated from several experiments. The results showed that increasing the network width and using multi-scale filters could improve the classification performance on high-resolution hyperspectral imagery, but not for model depth, which was found to be ineffective [77]. The ineffectiveness of increasing model depth may be due to various factors, such as limited data, model architecture, hyperparameters and data complexity. Algorithm patch restrictions could weaken the classification accuracy because the image input variability given different spectral, spatial, and temporal qualities may degrade the information gain. For instance, a UAV image was divided into small patches of 50 × 50 and 25 × 25 pixels for the 8 mm and 16 mm resolution, respectively, to feed a CNN for spinach plant counting. The patch sizes were raised to fit the CNN input size. The size-rise method forced the lowest patch resolution to lose much spectral detail [78]. Moreover, clipped patches with an excessively small window size would miss sufficient features, which would augment the risk of over-fitting [79]. Window patching hardens image labeling, as the model may count a single tree twice or count the unlabeled objects as trees (e.g., false positives as true positives) [33].

    The algorithm parameterization method proves its robustness widely as in [30]. For example, the quality of texture features depends on different kernel sizes; as the kernel size increases the noise is removed and the extracted features become more significant. However, the combination of texture with multi-temporal spectral information was less affected by the changes in kernel size. Parameterization is found to be more meaningful for SVM than RF [32]. However, generally, the significance of algorithm parameterization differs following extracted feature, temporal data collection and classifier.

    Transfer learning is important to save the model training time and improve its performance. The utility of transfer learning was examined for rice panicle count, whereby the accuracy of the proposed CNN model was compared to four pre-trained models from ImageNet and scratch. The results revealed a difference of 5% to 15% in initial accuracy. After a series of image training processes, the accuracy difference diminishes to 1%, suggesting that the method may not be of remarkable significance [80]. Model training and testing strategies used to build models are found to be imperfect and prone to loss error. Generally, Markov Chain Monte Carlo (MCMC) and cross validation (CV) are used to train and evaluate models for convergence [65], overfitting, reliability, performance, and accuracy [81]. For example, the cross-validation training that is based on a large epoch number does not necessarily decrease the loss error [82]. Additionally, testing the model on different subsets (folds) of the same dataset produces a lower root mean squared error (RMSE) [41]. Six You Only Look Once v3 (YOLOv3) backbones (i.e., DarkNet53, 251 using DenseNet121, 374 using ResNet50, 346 using MobileNetv2, and 276 using ShuffleNetv2) were tested within different training epochs to test the algorithm convergence speed. DenseNet-121 yielded the fastest convergence performance. Overall, rising the training number of epochs was found to be significant but only to a certain extent [83].

    Image labelling is still the bottleneck of modelling and is manually done. Manual image annotation (labelling) is very time consuming, laborious and exhaustive [84] yet crucial to achieve high model accuracy. Data labelling requires a correct delineation of feature, otherwise the algorithms would overestimate objects [85]. In a study conducted on banana disease detection, the results show that false positives were higher than true positives for the detection of banana and their major diseases due to two reasons; the ability to label all individual and clustered banana in the images and the possibility that the ML algorithms recognized the nonlabelled predictions as the true datasets [36].

    In precision agriculture, ground data are used to develop classification or regression models and evaluate the accuracy of aerial-data-based predictivity. Poor methods used for measuring ground data can lead to many uncertainties.

    Properly designed experiments should provide a platform to systematically address the experimental errors. Experimental design that lacks in-field crop variations and multi-temporal data collection is considered another source contributing to the results uncertainty. Ground data should be variable enough to represent variable conditions, and thus, multi-temporal and zonal data acquisition are prerequisites for robust, generalizable models [37]. However, highly mixed fields could affect the ability of models to learn from aerial data due to the possibility of data noise and outliers occurrence. Sampling techniques, area and field visit frequency are other factors consider for result accuracy. The directed sampling technique was found to be insightful for N grapevine spatial variability compared to conventional random sampling, grid sampling, and sampling based on vineyard history [73]. Additionally, the selection of sampling areas in the field could affect the accuracy of detecting sheath blight severity in rice due to the cluster distribution of the disease in infected fields [86]. The estimation accuracy of juniper canopy cover and density were subject to the sampling techniques and timing of data collection [47]. In contrast, using a CNN to map black grass in winter wheat during several weeks between two successive years did not show any difference in model predictive ability with respect to time variations [38]. The imprecision of ground truth measurement related to numbers of in-field staff and frequency of data collection could accentuate the model uncertainty [41]. Additionally, ground data measurement by averaging may lead to systematic errors and the appearance of outliers in the data, as highlighted by [62,87].

    Illumination conditions during imagery acquisition, observation angle, sensor sensitivity to light, UAV auxiliaries, quality of image georeferencing, imagery resolution, quality of image pre-processing [59] and final data preparation could lead to unclear patterns from obtained data and thus weaken the learning of models. For example, the estimation of maize AGB at leaf scale is impacted by observation angle, illumination conditions, canopy structure and leaf-morphology characteristics which result in systematic error [62]. For instance, a terrain laser scanner (TLS) and a UAV were compared for barley plant height prediction; the UAV-images registered high variations in plant height but a low mean of plant height values due to the imaging's angle selection [88]. UAV auxiliaries such as a gimbal, GPS unit and stabilization devices are important and can affect aerial data acquisition [35,57]. Furthermore, the low-cost RGB sensors are sensitive to variation in lighting conditions, and data augmentation could not minimize the occurring error [41]. Sunshine sensors may be unreliable for atmospheric calibration, and further correction is needed [37,57]. Additionally, changes in the weather conditions could disturb crop canopies during image acquisition, which further affected point cloud creation during image processing [16,33,52]. Variations in solar light within a day increase shaded areas on imagery, which weakens the model's learning ability [89].

    The high-resolution imaging is favorable for optimal phenotyping results [48,55]. However, high-resolution images can make the stitching process hard to achieve as the image contains many details [35]. Spatial resolution settings are found to be dependent on crop plants and traits. As for wheat AGB estimation, spatial resolution correlates with plant height [74]. Additionally, the distance between the UAV platform and the plant organ matters. The maturity of rice panicles was visible at canopy level but was not for soybean pods, hindering the model from learning features acquired at higher flying altitudes given the variations in height of the tested crops [41]. Lowering UAV flight altitude is not the appropriate solution to increase resolution since the downdraft produced by the UAV makes the leaves sway and thus affects the image registration process [86,90]. Additionally, flight speed can compromise image quality, that is, cause image blurring that eventually affects model accuracy [37]. Besides, a suitable ground sampling distance (GSD) setting depends on the characteristics of the plant traits to detect [35]. For instance, the bigger size of canola and chickpea flowers could be detected from 30 m above ground level, but the detection of smaller flowers was hard due to spectral mixing [91]. Although the high spatial resolution is prerequisite for high detection accuracy, the use of high resolution (i.e., 150 m, 3 cm/pixel) images was not enough to detect and separate weed species. Therefore, selecting the suitable method for image enhancement in classifying plant species is more crucial than spatial resolution [92]. For example, a super-resolution convolutional neural network (SRCNN) method was used to generate super-resolution images from low resolution and deformed images for the purpose of tomato plant disease detection [93].

    Image registration, stitching, and alignment operations require complete georeferencing procedures, radiometric calibration, and atmospheric correction. Otherwise, different related types of distortions will remain in the output orthophotos. As such, the lack of ground control points at late crop growth's stages augments the georeferencing error and further negatively affects the point cloud built, which could complicate the structure from motion (SFM) processing needed for plant height prediction [52]. For instance, it was hard to distinguish between soybean lines and various yellow-colored leaves in case of overlapping canopies, due to the weak image stitching resulting from insufficient spatial resolution, weak georeferencing, and dense cropping system [31,94].

    Multi-band alignment can provide more descriptive information, but with fewer matching points, error occurrence is more possible for classification interventions. A study for vine's disease row detection applied normal and dynamic band-alignment methods; the dynamic method outperformed the standard method, but alignment error is still present with both due to the lack of matching points [58]. Further, the two-matching methods make image registration across different modalities hard. Moreover, blurry and shadowy images can severely affect model uncertainty [95], since variations in light conditions and intensity may lead to variations in image-derived features [94]. The segmentation accuracy depends strongly on the light intensity, the resolution of the training image, the sensitivity to noise and the lack of high-quality labeled training samples [80]. The complex image pre-processing methods extend algorithm runtime which hampers real-time phenotyping and the farmer's in-time benefit.

    Data cleaning, balancing, sample size control, data augmentation, noise and outlier removal, and data variation are techniques used for data preparation. Preparing the optimal data may strengthen the model's performance but one should consider the model's structure and sensitivity to different data patterns [96].

    The unequal distribution of classes on imagery unbalances the dataset. Unbalanced data fosters the model uncertainty same as the extreme data variation does. For instance, plots that contained a higher percentage of the same classes were removed to avoid the data unbalance issue that affected the CNN model training [97]. The classes with more trained elements may become more sensitive to identification compared to classes with fewer trained elements which biases the learning results [35,84]. In unbalanced dataset, the large-variation shifts between the training and the test data sets might weaken the model's transferability [38]. Additionally, the unbalance among classes in the image and sample size might lead to low segmentation accuracy, as found in a study done for rapeseed's leaf segmentation using SVM and RF [79]. In a study applying a TasselNetV2+ for plant counting, the significant cultivar's variation and illumination's change hampered the model's ability to generalize [98]. Moreover, one of drawbacks of data balancing is the necessity to reduce the training dataset's size when the gap between classes' density distribution is large. Reducing the dataset's size will lead to an insufficient feature learning. A study aimed to map weeds at subfield scales; the dataset was balanced between black-grass and winter wheat by reducing one class size for training. The method slightly reduced the model accuracy while increasing the misclassification rate to 22.4%. Therefore, data cleaning (e.g., removing errors and duplicates) was applied to the unbalanced dataset; the accuracy of the learned model increased by 4.6%, which potentiates the efficiency of data cleaning methods [38]. Overall, the unbalanced data, large sample size and large-variation shift between training and test data gnaw at the model generalizability. Data variation may force the model to learn features that generalize to all conditions. Unlike, a study aimed at identifying early indicators of water stress found that classifiers trained by pooling images of all species of ornamental shrubs had significantly lower the performance than the classifiers trained with images of just one species despite having a larger training set. The issue raised from the fact that symptoms of water stress varied from one species to the another [99]. Possibly, data variation enhance the model generalization but after cleaning and the data classes must remain balanced when the learned model is transferred from training to testing. Therefore, the characteristics of each crop's trait must also be considered.

    The efficiency of large training datasets to enhance the modeling has been proven [33,37,39,40,43,45,73,100]. Thus, data augmentation methods are a resort for limited access to data acquisition. The final performance of the DeepLabv3+, improved RF, efficient dense modules of asymmetric convolution (EDA) and random forest (RF), after data augmentation, were better to estimate the maize-leaf coverage [65]. Nevertheless, sample size augmentation has opened access to outliers and noise, which may reduce the model's accuracy [56]. For instance, the synthetic data augmentation method and image replication are prone to redundancy related to variation in light intensity variation, geometric error, blur, and shadow effects in the images. A study used a CNN with a dropout layer to tackle the redundancy issue; the image replicates decreased the performance of the model due to its high dependency on color. The combination of the dropout layer and data augmentation created excessive randomness which reduced the effectiveness of the model training [41]. Whereas, both dropout and data augmentation have positive effect on disease classification, showing that the model has good control over the rate of training without overfitting and without going to algorithm parametrization [36]. Basically, data variation reflects the study field mixture, which returns to the farm's management way.

    The accuracy of modelling results does not depend on the number of features to use as much as the feature dependency on crop variation, crop traits, growth stage [43,82], feature selection [95], feature combination [101], temporal variation of data acquisition [32], classifier type [102], image resolution, spectral bands and bandwidth used to extract such features, and the uneven importance of features used to predict crop agronomic traits, so-called feature ranking approach [55,103].

    The features' combination is a significant method to extract more spectral information. For instance, an RGB, excess green index (ExG) and excess green minus excess red index (ExGR) were combined to capture rice lodging by using the EDA with asymmetric convolution (EDANet) [104]. Features' combination improved the automated yellow rust disease detection for rice biomass estimation [77,105]. The combination of hue, saturation, the visual atmospheric resistance index (VARI), and ExG was more significant than individual features for Italian ryegrass detection in wheat [54]. Feature selection also improved the prediction of fresh yield and dry matter yield of grass swards [19]. In contrast, the combination of spectral bands (SBs) and vegetation indices (VIs) seemed to lower the performance of the RF model compared to reduced error pruning tree (REPT) and K-nearest neighbours (KNN) models for leaf N content and plant height estimation in maize [19]. The data dimension's variation also relates to the high variations in agronomic traits. For instance, a model with twelve variables was able to predict the lower ends of canopy N weight values with a very high degree of accuracy although it struggled to perform similarly for higher range values [76]. Possibly, the feature combination has same effect as the data augmentation, which increases the information redundancy, noise, and outliers, which increase modeling complexity and uncertainty. Thus, removing the unrelated variables could mitigate the error caused by high data dimensionality [76]. In a study conducted to estimate the maize AGB, the datasets of highly correlated variables with maize AGB performed slightly better than those of all 3D-crop height features (3D-CH) [74]. On the contrary, the feature selection method contributes to data processing complexity and rises the results' uncertainty due to the deficiency of highly intercorrelated features [75]. Furthermore, the best variable selection method might not be suitable for all classifiers. For instance, after combining the blue band, Vis, and principal component analysis (PCA) from all UAV-features, the SVM achieved lower accuracy than RF for that scenario [36].

    The uneven importance and multi-dependency between features is crucial element to consider for optimal features' selection and combination. For example, the PCA-trained model showed a better performance than the raw bands based one for banana disease classification [82]. Additionally, the VIs were better than raw bands for nitrogen (N) and plant height estimation because the VIs enhanced some characteristics related to biological variables such as chlorophyll content and biomass [40] and were better than texture features [32,69]. In contrast, band differences performed better than the normalized vegetation difference index (NDVI) to evaluate late blight severity in potatoes. Although NDVI relates to foliar coverage, the NDVI was not directly related to diseases at early stages [106]. Similarly, the NDVI did not perform well compared to other VIs due to the low saturation level with respect to canopy nitrogen weight once the canopy of the crop becomes dense [76], as found by [26]. In another study, the VIs' significance was low for modelling wheat AGB since they were extracted just from RGB images, and they tend to saturate at high levels of biomass. The large spectral range of visible bands and inaccurate spectral response functions made it hard to convert digital numbers (DNs) to reflectance [75]. The RGB insignificance refers to the fact that RGB are informative only for some crops, such as banana and maize, in contrast to legumes (i.e., potato, sweat potato, and beans), due to their unclear aerial profile, diversity and intercropping character [35]. In another study aimed to determine health status of the plants and the canopy biomass found that the VIs significance depends on the presence of near infrared (NIR) reflectance due to leaf cell structure rather than leaf chlorophyll content [57]. Additionally, vegetation indices may reflect different phenology curves in relation to growth stages which may be suitable to phenotype specific crop traits given the VIs-crop variation dependence [107]. For example, for wheat yield estimation, the normalized difference red-edge index (NDRE) performed well at the flowering stage, the canopy chlorophyll content index (CCCI) at the filling stage, and the normalised difference vegetation index (NDVI) at the joining and booting stage [108]. In addition, VIs are sensitive to environmental conditions [36]. Additionally, the image resolution may strengthen the optimal extraction of specific features such as vegetation indices. A study compared different VIs derived from UAV, Sentinel 2 (S2), PlanetScope (PS), and WorldView-2 (WV-2) on pixel-based banana classification under mixed-complex landscape. The results show that enhanced vegetation index (EVI) and triangular greenness index (TGI) were significant for medium resolution images, while chlorophyll index green (CIG) and ratio vegetation index green (RVI-G) derived from high (PS, WV-2) and very high-resolution UAV sensors are more promising for the detection of banana plants and their major diseases [36].

    Same as the VIs, the importance of spatial features is multivariate dependent. For example, the hyperspectral-3D features were more significant for AGB estimation than N in barley [63]. Moreover, the 3D features may be significant if they complement other features combination or changes in data acquisition date [19]. For instance, a study observed a limited significance of 3D-based plant height; adding this variable to the normalized green-red difference index (NGRDI), excess green-red index (ExGR), and vegetative index (VEG) increased the coefficient of determination (R²) just by 0.03 for fresh and dry maize AGB and the root mean squared error (RMSE) by 0.02 kg m-2 for fresh maize AGB prediction [109]. In another study conducted to predict cover fractions of plant species, the 3D features did not improve the model accuracy significantly, possibly because of the redundancy of 3D information on the canopy structure [92]. Similarly, the 3D features exhibited a low contribution towards maize AGB estimation [74]. Also, the canopy height model (CHM) is of medium significance for model leanings since it varies across growth stages [86]. Furthermore, the PHCSM (i.e., the plant height extracted from crop surface model) was found to be insignificant because it hugely depends on the image resolution [81]. ALSCHM (i.e., a crop surface model extracted from the imagery of airborne laser scanning) did not give any contributions when added to spectral bands for early detection of invasive exotic trees because of the short height of the surrounding weeds and unmixed landscape [110]. Generally, the importance of 3D features strongly depends on the SFM-building of 3D, which is impacted by the technical and field conditions Sections 3.2.1 and 3.2.5.

    The multivariate dependency may be applicable for color spaces. For instance, study compared the contributions of RGB, the hue-saturation-value (HSV) and L*a*b (i.e., L* for perceptual lightness, a* and b* for the four unique colors of human vision: red, green, blue, and yellow) color spaces to discriminate among rice, weed, and soil classes in the upland rice field using a simple linear iterative clustering based random forest (SLIC-RF). The results show that Rstd, Rmin and Gstd were important in contrast to the blue band and Smax, Vmin, Hmed, Hmax and Sstd for HSV SLIC-RF, and L*min, a*min, a*std, L*std and a*med for L*a*b for SLIC-RF. Although, HSV was the most significant color space for the study's aim [69]. In another study, RGB bands were slightly better than RGB-based L*a*b color space for disease detection [111]. The hue-intensity-saturation space (HIS) extracted from RGB, and multispectral images could partially distinguish the canopy changes caused by the disease.

    Along the same line, texture features depend strongly on the high resolution of image [57], crop variation [95], crop trait and spectral band. A study investigated the potential of textural information to predict AGB and N fixation (NFix) in clover-grass and lucerne-grass. The texture features improved the model's performance for fresh and dry matter but not for NFix. For fresh and dry matter, the RF with texture improved the relative root mean square error percentage (rRMSEP) for clovergrass (CG) more than the lucernegrass (LG), while the PLS model resulted in a higher rRMSEP. For NFix, the RF without texture exhibited a lower rRMSEP for CG than for LG, and the PLS with texture achieved a lower rRMSEP for LG than for CG without texture. Generally, the RF-rRMSEP was the lowest for the whole dataset without and with texture, respectively. The red band was the best to generate optimal texture for fresh and dry matter compared to green, red-edge and NIR but for NFix, RE and NIR were the best [37]. Furthermore, the window size used to extract texture features can impact the feature's importance; increasing the window size generates coarser texture, and vice versa [95]. Several bands also show the same impact. A study used only individual bands to generate a texture feature to segment plastic mulched farmland, which results in coarse segmentation due to the low information provided by separated bands (i.e., 490 nm, 550 nm, 680 nm, 720 nm, 800 nm, and 900 nm) [34]. Besides, texture features were found to be the second important input variable (complementary) to improve the SLIC-RF model when compared and added to VIs and color spaces for weed/crop segmentation [69]. A similar finding is reported in [32] and [34]. In another study, the texture of canopy-structure features produced poor performance for N estimation when regressed alone. However, when adding them to the VIs, the model performance improved due to the weakening effects of soil background and saturation issues addressed by the VIs [26].

    As aforementioned, spectral range may determine the feature' importance differently; each range's band contributes differently in feature extraction output. For instance, a study found that the visible range was better than the infrared range for the detection of vine disease, but the fusion of these two ranges outperformed their individual use. Possibly, the visible image provides a better colorimetric description than the infrared image [58]. On the contrary, the near-infrared band exhibited the largest absolute difference between the spectral reflectance of low-N and high-N classes, followed by the red-edge, green, red, and blue bands, which is similar to what was found in [47,73,81]. Moreover, hyperspectral bands may lead to rich spectral information [34], but the use of blue, green, red, red-edge and near infrared might not be sufficient for N assessment in young leaves with less chlorophyll since the spectral reflectance of young leaves could be similar to leaves with less N concentration [73]. A similar finding is reported in [112]. The NIR range was more significant than RGB for wilt radish detection, and the long path for RGB processing diminish the usefulness of the RGB range unless considering their cost-based availability [113]. Thermal bands may be insufficient to distinguishes disease classes unless added to multi-spectral bands [68]. In a study done for vine-crop water stress index (CWSI) estimation from stem water potential (SWP) by using thermal infrared bands (TIR), the higher shadowed canopy levels present low temperature, which alleviate the significance of TIR to correlate with CWSI and SWP, leading to its combination with other spectral bands such as visible range bands [114]. Multi-band fusion may strengthen single bands since it could provide more spectral information and generate an optimal feature [115]. However, this method is prone to the impact of the radiation characteristics of each band and the uncertainty of remote sensing data, which requires studying the spatio-temporal matching of the data [116]. Furthermore, each band is sensitive to specific agronomic trait [117]. Additionally, a narrow band width of 10 nm in the MicaSense red-edge band may not be able to capture the red-edge position, which reduced the model's performance to predict the corn canopy's N in a study performed by [76].

    Remote sensing technologies allow phenotyping crop traits by measuring the spectral reflectance at different levels of plant organs. Nevertheless, the accuracy of in-field data acquisition depends on the crop's surrounding field conditions such as soil color variation [49], atmospheric conditions [27], scene plantation landscape [36,42], presence of water [45], farm management such as field location and area [34], farmland topology [47], and cropping systems [21,43,44,54].

    A convolutional autoencoder (CAE)-CNN model's robustness was tested on three conditions: two different soybean trials, two different field locations and two different vegetative growth stages. The model performed well for all experiments for the different soybean trails and vegetative growth stages but only for one field location [97]. In another study, the ears of winter wheat detected by deep DCNN and fully connected convolutional networks (FCN-8s) at the flowering stage were very similar to leaves because of noisy imagery captured under field conditions. The strong illumination and clutter background brought weak color brightness, which hindered the DCNN and FCN-8s from distinguishing different classes [100]. Generally, soil texture, pH value, weather, water availability and nutrients have a more balanced impact on crop growth which may lead to uncertain results when omitted from the study. As such, in a study of soil salinity inversion in winter wheat the results show a limited relationship between the VIs and soil salinity for different environmental conditions [116].

    Fertilization management could be a good reason for increasing results uncertainty. A study estimated plant height and leaf N content of maize by using RF algorithms under the impact of two topdressing fertilization rates of N. The RF model had a low consistency for plant height prediction as the model fitting line was loaded with outliers. However, for the leaf N content, the high and low rates of N were separated by the model due to the impact of the two rate levels of fertilization. The N fertilization also shapes model performance differently with different agronomic traits [40]. Similarly, a study regressed VIs, the canopy height model (CHM), RGB, and NIR bands to estimate grass sward biomass under six N fertilization rates applied on four harvest dates per season. The results showed that both dates and rates of N applications affected the correlations between VI features and height features [19]. On the other hand, in a study carried on coffee crop, the correlation between N and SPAD was negative. This could be due to the transportation of nutrients caused by the surface runoff and rainfall that accumulate the nutrients in the lower parts of the farm area. In this study, the land slope was identified as a primary factor influencing the model's performance [22]. On the contrary, the N fertilization treatment rates and seedling densities improve the universality of the model [108].

    Furthermore, dense cropping overlaps the canopy, which may reduce the accuracy of class separability [22,48,78,118,119]. Different cultivars usually share common plant characteristics that exhibit similar reflectance, which may hinder the optimal identification and classification of visual patterns [32]. For example, a U-Net CNN could separate the object boundaries between two different classes of sorghum, but it could not detect overlapping sorghum panicles [33]. For the density of rice panicle detection, the results show that plots of rice treated with N produced more panicles compared to untreated plots, and thus, denser plant canopy tended to reduce the algorithm's classification's accuracy [80]. In another study, the distinction of legumes from crops such as maize was hard because of their unclear spectral profile due to intercropping [35]. Similarly, the accuracy of tree and tree's gap count for tree cultivated on the normal spacing blocks was found low; the crop density lowered the model performance for the high-density spacing blocks [85]. However, trees detection and counting could be feasible at the late growth stages given that some trees with high yield tend to alleviate canopy area expansion. On the contrary, some crops tend to extend canopy density at the maturation stage, which requires extensive knowledge about species characteristics to differentiate [120].

    The challenges to distinguish overlapped classes may be solved via classifiers applied. For example, a CNN followed by a classification refinement using super pixels derived from a simple linear iterative clustering (SLIC) algorithm is used to detect citrus from other crop trees. The CNN-SLIC reduced the number of misclassified citrus trees in a complex cropping farm [66]. Also, the performance of the DeepCount model did not degrade due to the complex canopies with a high level of wheat ear density [61]. Moreover, a YOLOv3 performed well when cotton plants ranged between 0 and 14 per linear meter of row [121].

    The presence of weeds in the background may have the similar impact to that of overlapping crops, making it hard to distinguish crops from the background [46,66,102]. For example, a study applied the Otsu thresholding method to ExG to create a dataset of images devoid of weed-crop backgrounds. The parts of plants that were less green were considered soil, which indicates the weakness of the Otsu-ExG method. Further, a CNN was applied to classify spinach and bean crops. The inter-row and intra-row weeds were over detected at the edges of the crops where the CNN window could not cover the plant as whole [102]. In a study that used a CNN to map crop trees, the characteristics of the textural and structural differences of the target objects defined the accuracy of the CNN compared to the surrounding vegetation [92]. Additionally, weed height may impact weed crop classification [85]. In a study aimed at detecting and mapping early weed between and within crop rows from two flight altitudes, the RF-OBIA could generate a coarser delineation of plant objects from the higher altitude image but could not detect weed at an early stage given the weed's size which is similar to that of young trees [46]. Therefore, the imagery resolution affecting the weed size may impact the model's performance.

    Crop characteristics vary following the variation of physical and biochemical traits of plant organs. This is the reason why spectral reflectance changes among growth stages. In a study done for detecting N stress, the color of leaves showed clear variations given different growing stages, which affected the segmentation results based on the Hue-based segmentation algorithm (HSeg) [79]. Additionally, the NDVI, added to multispectral features for vegetation cover estimation, exhibited a visible variation from October to July imaging for all tested crops: maize, beans, bananas, cassava, potatoes, and sweet potatoes [47]. In a study that examined the correlation between yield and visual ratings of flower features, the correlation was weak for spring crops such as canola, peas, and chickpeas (r between 0.25 and 0.48), but strong for winter canola (r up to 0.84) [35]. In another study done for wheat growth monitoring and yield estimation, the results show that flowering stage was the best growth stage to build the yield estimation model [108]. Additionally, a DCNN model provided more accurate detections of yellow rust disease on datasets collected at late stage of winter wheat [77]. In a study where the red-edge chlorophyll index CIred-edge of Sentinel-2 and UAV were used for wheat leaf area index retrieval, the features were correlated at advanced stages but not at an early stage due to the soil effects that clearly changed the reflectance and VIs values of the wheat canopy that were highly heterogeneous across the field [122]. On the other hand, the correlation between cassava dry matter accumulation and VIs at the late growth stages was nihilistic because of leaf senescence [82]. The same issue was detected in [50]. The addition of plant senescence reflectance to VI calculations, as vegetation decreases in leaves in late stages, could enhance the correlation [16]. Another study compared the correlation between TLS- and UAV-derived plant height; the correlation was lower for later growth stages. This is possibly due to the change in canopy geometric properties at later growth stages [88]. Additionally, a CNN model failed to differentiate between wilt disease regions and the soil regions due to similarity of textures and color when the regions were imaged with RGB bands at the late stages [113].

    On the other hand, one growth stage may not be able to provide enough information for modeling [17,34]. For instance, in a study to identify changes in wheat leaf area index (LAI), the results show that the error of the single-stage calibrated model (NRMSE = 17%) was the double of the two-stage calibration (NRMSE = 8%) [122], the same was confirmed by [121] for cotton plant counting. On the contrary, the R2 of plant N uptake for individual rice crop growth stages was above 0.60, suggesting that single growth stages may perform well in some cases [43]. Overall, the selection of optimal growth stages for crop trait phenotyping was the best solution for plant density in maize [48].

    Land preparation such as tillage or furrow irrigation could also lead the ML algorithm to overestimate the results [46]. In a study examining the effect of tillage methods on the estimation of sugarcane aboveground fresh weight, the tillage intensification reduced the model's potentiality to estimate the fresh weight. The impact rate was translated in an RMSE of 16.84 kg m-2 and 17.43 kg m-2 for individual tillage and intensive method, respectively [123]. Furthermore, the high tillage frequency, trees being far from roads, non-tilled areas and absence of light limitation hinder the spreading and reproduction of black locust. People trampling and vehicles rolling changed the spectral signature of the crop being close to the road, which might weaken the estimation performance [124]. On the contrary, in another study for predicting within-field variability in grain yield and protein content of winter wheat using UAV imagery and a linear regression model (LM), an SVM, an RF and three artificial neural networks (ANNs), the lower yield and protein content occurred approximately 5 m from the edge of the fields while the higher yield occurred in strips along the tractor's direction of travel [81]. Additionally, the inner-row path and vegetation leads to biased cropland assessment, that may be due to feet compaction [125]. Likewise, the yield loss was positively correlated with the severity of the wheel compaction, as the spectral reflectance might not be able to provide complete information in the compacted part of the field and thus reduce the yield predictive ability of the model [115]. In another study conducted to predict cover fractions of plant species by means of a CNN, the results show that the predictive accuracy increased with increasing tile size (i.e., CNN image input size) of training sample images; reflecting the increased spatial context captured by a tile since larger tile size includes more spatial information [92]. Additionally, farmlands with higher degree of fragmentation and scattered maize fields in small patch sizes increased the difficulty of remote-sensing identification compared to concentrated fields [17]. Nevertheless, the field multi mixture may include confounding factors and noisy data that weaken the model built as found in [27].

    Figure 2.  A scheme of the main factors affecting the efficiency of ML algorithms in UAV-based precision agriculture.

    The basis of UAV precision agriculture is to use UAV spectral information and MLs to observe, measure, and analyze crop traits and their interactions with the surrounding environment. UAVs have been proven to provide valuable spectral analytic information. MLs are being used as efficient methods for spectral data analysis. However, UAVs might not provide exact data measurement, which gives a data of high dimension and low quality or low dimension and low quality. Thus, it's most probable that MLs can lead to uncertain results in modeling.

    Generally, the uncertainty of the results in modeling may results from three main causal-phases: (ⅰ) field conditions, (ⅱ) farm practices, (ⅲ) data collection, preparation and analysis. In brief, soil salinity, content of organic and inorganic nutrients, soil types, soil density, surface and groundwater, solar radiation, wind, humidity, temperature, light, shadowing, cropping systems, planting density, weed presence in the farm, tillage time, tillage's engine used, fertilization rates and timing, and irrigation time and rates directly affect the root and shoot systems of plants. The variation of these conditions from field to field controls the physicochemical and morphological traits of the plant. The reflectance also varies accordingly. The result of an experimental design is subject to field conditions and farm practices. Thus, different experiments with different control factors give different results which might confuse the overall contribution of different studies. Overall, field conditions, farm management and farming practices are not controllable. Thus, the main interest should be given to how to improve the data collection, preparation and analysis. In other words, to improve data quality. There are several novelties to import into the precision agriculture domain, such as smart sensors, mechatronics, internet of things (IoT) and AI for high-quality data collection and data transformation methods for data preparation and analysis.

    Recently, the use of sensors such as light detection and ranging (LiDAR), computed tomography (CT), positron emission tomography (PET), and magnetic resonance imaging (MRI) [126] has gained significant attention in agriculture. However, smart sensors are an additional benefit to add. Smart sensors are devices that use embedded miniature electronic devices and wireless networks to monitor and collect information from various physical environments with high accuracy. Smart sensors can automate farm activities, examine the health status of plants, and monitor plant growth in hydroponics and aeroponics [127]. Additionally, smart sensors strengthen the early detection and diagnosis of plant-related biotic and abiotic pressures, weed mapping, yield monitoring and mapping, monitoring variable rate fertilizers, spraying, and mapping salinity.

    Smart sensors can collect spectral information from the thermal, multispectral, hyperspectral, and visible spectrums. Hyperspectral acquisition modes include: (ⅰ) point scanning (whiskbroom) mode, (ⅱ) line scanning (push broom) mode, (ⅲ) plane scanning (area scanning) mode and (ⅳ) single shot mode. The whiskbroom scans at one point at a time which requires multi-axis scanning and time but provides high-resolution, which is required for pest detection and classification at the asymptomatic stage. This push-broom method operates simply and can increase the signal-to-noise ratio (SNR). Setting the optimal exposure time is crucial for band saturation with this method. The plane method scans the entire 2D area once at each wavelength interval, which requires several images captures to create the spectral depth of the hyperspectral data cube. Moreover, the target should be stable for better spatial and spectral resolution results. Spatial and spectral resolution is required for low-height and densely cultivated crops. The single-shot sensor collects all the hyperspectral data cube within a single integration period but provides lower spatial resolution. Multispectral imaging provides numerous and narrower wavelength bands that sometimes overlap, as is the case with the visible spectrum [129]. Thermal sensors performance depends on the sensor type, topology, and heat detection linearity. Visible spectrum ranges between (400-700 nm); this range can give detailed description of crop trait variation because plants have blue light dependent and non-blue light dependent physiological functions that control the building and transformation of carbohydrate for plant growth. Generally, each sensor category has some privileges and drawbacks associated with its use and each type's characteristics. Table 1 describes in detail the privileges and drawbacks of each category.

    Table 1.  The description of smart sensor types.
    Sensor type Group Descriptions Reference
    Smart thermal sensors and probs IC (Integrated circuit) sensors

    −55 ℃ to +200 ℃ (range), of good/best accuracy, small size, easy to use, no calibration software is needed, topology of point-to-point, multidrop or daisy chain and low to moderate price [128]
    Thermistors Range of −100 ℃ to +500 ℃, calibration-dependent (low linearity) accuracy, small size, moderate complexity, point to point topology and low to moderate price [128]
    RTDs (Resistance temperature detectors) Range of −240 ℃ to 600 ℃, best accuracy, moderate size, complex to use best linearity (no calibration is needed), point to point topology but expensive [128]
    Thermocouples Range of −260 ℃ to +2, 300 ℃, better accuracy, large size, complex to use, better linearity, point to point topology but expensive [128]
    Multispectral
    sensors
    Efficiently capture nearly a dozen of relevant health indices, variants include (e.g., RGB, NDVI, NDRE)
    Actively scout fields with live-streaming video options
    Quickly integrate with DJI M200 and M210 drone series using Lock-and-Go gimbal technology
    Seamlessly flow data to field-agent web, mobile, and desktop platform; generate shapefiles, support prescription development, and telematics integration
    Minimize the effects of plant damage by quickly and accurately detecting plant stress
    Capture comprehensive plant data to help identify productive plants suitable for their environments and select desirable crop traits to improve outcomes
    The powerful multispectral data delivered by the 6X allows for you to pinpoint areas of low nutrient availability
    Quickly identify comprehensive field health, monitor the effects of applications throughout the season and determine the need for future applications with precision
    (https://www.poladrone.com/)
    (https://www.hyspex.com/)
    Hyperspectral sensors Designed to minimize optical distortions down to 10% of a pixel over the full spectral and spatial range
    The spatial and spectral resolution is optimized to be as similar as possible for all points in the FOV and all spectral bands
    Hyperspectral camera with an onboard computer and an integrated navigation system, all fitted into a self-contained module
    Support dual IMU which is especially important when operating with the gimbal to compensate for the dynamic lever arm
    Capture (sense) images
    Enhanced sensor operation, such as to enable high dynamic range acquisition
    Perform spatial-temporal analysis
    Support decision-making based on the outcome of that interpretation
    (https://www.poladrone.com/)
    (https://www.hyspex.com/)
    (Arena & Patanè, 2009)
    RGB smart sensors Resolution of 0.2 mm
    Maximum of 4000 tips per interval
    2 m cable
    Calibration accuracy ±1.0% at up to 20 mm/hour (1 in./hour)
    Operating temp range: 32 to 122 ℉ (0 to 50 ℃)
    https://www.instrumart.com/

     | Show Table
    DownLoad: CSV

    The smart sensors connected with wireless networks can monitor and collect information from various physical environments within different spectral ranges. The smart sensors can provide big data in real time with high accuracy, which provides a dataset of dense patterns. Consequently, datasets with dense patterns allow MLs to learn solid models and give certain results.

    Robotic technologies have emerged to automate plant trait measurement and expand access to field imaging within successive growth stages. Robotics accelerate data acquisition with high repeatability. Additionally, robotics allows for the collection of ground data from large-scale screening which increase the sampling representativeness regarding the entire field population. Higher sampling's representativeness and frequent access to data acquisition provide a large dataset with rich visual and quantitative patterns, which make the learning task fruitful [130]. Thus, robotics technology should be an investment focus in precision agriculture.

    There are several integrated applications that use the IoT and mechanization for remote control available for different uses, such as computer-aided design and modeling (CAD/CAM) applied to develop the 3D model of the robotic platform. The control system incorporates an embedded system to accommodate motion and operation control software, which is interfaced with the main processing computer. This later runs the measurement device, such as lidar, with a positioning system and device, such as the real-time kinematic (RTK), with a global navigation satellite system (GNSS) [131]. Additionally, autonomous tractors with the proper rigging are used for several agricultural applications. Tractors can till, fertilize, plant, spray, weed, mow, haul and harvest. Such flexible and automated use of tractors allows for increased productivity, improved safety and reduced costs for many agricultural procedures. These tractors are fully equipped with vision and proximity sensors that allow the post mechanization of the farm field. Additionally, automated tilling and sowing machines are available for post-cultivation processes. The machine allows for a quick, low-cost and controlled tillage depth, sowing rate and density. Furthermore, a Zigbee microcontroller and a well-installed network of cheap sensors within the field can be used to control and monitor each's plant environment. Robots that use one gray-level vision sensor can be used for spraying, fertilization, weed removal and balanced irrigation [132]. Table 2 shows more details about these technologies.

    Table 2.  Summary of robotics to use in precision agriculture.
    Robotics Description Reference
    Autonomous tractors These are self-driving tractors equipped with sensors and GPS technology to navigate through fields and perform various tasks, such as planting, spraying, and harvesting. https://www.bearflagrobotics.com/autonomous-farm-tractors/
    Drones Drones equipped with cameras and sensors can be used to collect data on crop health, soil moisture, and other environmental factors. This information can be used to optimize crop management. https://www.dji.com/camera-drones
    Robotic harvesters These machines can be used to pick and sort fruits and vegetables, reducing the need for manual labor and increasing efficiency. https://www.agrobot.com/e-series
    Autonomous weeders These machines can use machine learning algorithms and computer vision to identify and remove weeds, reducing the need for herbicides and manual labor. https://carbonrobotics.com/autonomous-weeder
    Robotic greenhouses These systems can be used to control temperature, humidity, and other environmental factors in indoor growing environments, increasing crop yields and reducing water and energy consumption. https://www.postscapes.com/smart-greenhouses/
    Autonomous sprayers These machines can be used to apply pesticides and other chemicals to crops, reducing the risk of human exposure and increasing efficiency. https://www.agromillora.com/olint/en/
    Soil monitoring robots These machines can be used to analyze soil composition and moisture levels, helping farmers make informed decisions about crop management. https://www.agritechfuture.com/
    Livestock monitoring robots These machines can be used to monitor the health and behaviour of livestock, helping farmers detect and prevent disease outbreaks and increase productivity. https://www.dilepix.com/en/
    Autonomous seeders These machines can be used to plant seeds with precision and accuracy, reducing waste and increasing crop yields. https://www.futurefarming.com/
    Fruit picking robots They provide the farmer with invaluable data, including real-time updates on harvesting progress, duration, quantity harvested, and cost. https://www.weforum.org/

     | Show Table
    DownLoad: CSV

    The combination of IoT with robots and sensors allows for the best crop monitoring of plant environments and dense, exact data collection. The data with such quality allows quite integrated analysis. This latter approach covers all sides when modelling crop traits, gives the best possible results, and reduces uncertainty.

    The digitalization of data collection allows massive data transfers, real-time collection and makes the collection error rates in showing predictable at detailed levels. Furthermore, the digital mechanization of farm practices reduces the error sowing and fertilization, which saves the cropping and farm management materials. Rather than the applications that can improve the interaction between the small-scale farmers and the farming, there is more important software that can facilitate the control, monitoring, and examination of in-field installed devices and robots. Agrivi, Granular, Trimble, Farm-ERP, Farm-Logos, Ag-World, Agri-Webb, and Conservis are commonly used to optimize, automate, and schedule the production activities, as well as for data storage and analysis. These software are of administrational commercial format.

    Rather, the linkage of smart sensors with wireless networks and IT software allows a real-time and frequent monitoring of canopy development, including parameters, such as phenological development including germination, leaf development, flowering, fruiting, maturity, and senescence, which give detailed information about plant trait variation at each stage. It allows to examine the health status of the plant with regard to the morphological parameters such as height, leaf area index, ground cover fraction, storage content, and biomass, which help predict the yield potential of the plant. The volume and quality of such information allow the researcher to get precise modelling results.

    Data cleaning and preparation are requirements before the analysis. Several analysis methods and algorithms were incapable of modeling different sorts of data. Data transformation is a powerful method to prepare data of different formats into a format that can be modelled by some powerful algorithms. For example, CNNs are designed for visual image recognition with a high ability to handle big data and reduce model overfitting. However, the performance of CNNs regarding regression-tabular data is less inspected. Regression is crucial in precision agriculture. Therefore, in this section, we review several algorithmic methods that are under assessment to transform different data formats to CNNs' input format so that other domains other than agriculture can benefit from it.

    Basically, CNNs are designed for visual image recognition with a high ability to deal with big and complex data. The concept of Deep-Insight is to first transform a non-image sample into an image form and then supply it to the CNN architecture for prediction or classification purposes. The method transforms a non-image sample in the form of vectors into meaningful images for the CNN application. This method causes the CNN to use all the versatility attributes used in image-data classification [133]. All programming languages (e.g., Python) and integrated development environments (e.g., Colab) can be used for this method application.

    CNNs show a high performance in visual image recognition. However, most tabular data do not assume a spatial relationship between features and, thus, are unsuitable for modelling using CNNs. In response to the problem posed, the IGTD method transforms tabular data into images by assigning features to pixel positions so that similar features are close to each other in the image. The algorithm searches for an optimised assignment by minimizing the difference between the ranking of distances between features and the ranking of distances between their assigned pixels in the image [134]. All programming languages (e.g., Python) and integrated development environments (e.g., Colab) can be used for this method application.

    Convolutional neural networks (CNNs) represent a major breakthrough in image classification. However, there has not been similar progress in applying CNNs, or neural networks of any kind, to the classification of tabular data. A novel method, tabular convolution (TAC), was developed and evaluated for the classification of such data using CNNs by transforming tabular data to images and then classifying the images using CNNs. The transformation is performed by treating each row of tabular data (i.e., a vector of features) as an image filter (kernel) and applying the filter to a fixed base image. A CNN is then trained to classify the filtered images. Further, the TAC was applied to the classification of gene expression data derived from blood samples of patients with bacterial or viral infections. The results demonstrated that off-the-shelf ResNet can classify the gene expression data as accurately as the current non-CNN state-of-the-art classifiers [135].

    Overall, given the power of these algorithms in image pattern classification, data transformation methods can extract features from tabular discrete and continuous data and then arrange them in a multidimensional image space to be classified later with a convolutional neural network. The transformation method allows for mitigating modeling error and lower uncertainty in modeling results. All programming languages (e.g., Python) and integrated development environments (e.g., Colab) can be used for this methods application.

    This study aims to review different uncertainty sources affecting the operational efficiency of ML algorithms in regression and classification interventions examined among the crops of interest in UAV-based precision agriculture; (i) field conditions, (ⅱ) farm practices, ⅲ) data collection, preparation and analysis were found to be the main sources of uncertainty in modelling results. Field conditions are uncontrollable, but farm management can be partially improved to mitigate the uncertainty. Thus, the main interest is driven by the novelties that can improve the data collection, preparation and analysis processes to obtain high-quality data. The use of UAV-mounted smart sensors, mechatronics, the IoT and AI can strongly provide exact, real-time, dense, and high-quality data. High-quality data can strongly reduce the uncertainty of results in modelling, and precision agriculture. By figuring out the sources of uncertainty that affect the operational efficiency of MLs in regression and classification interventions in agriculture, it might be able to make better decisions about how to use MLs in precision agriculture studies in the future. This would allow researchers to choose algorithms that work best in different study contexts.

    The future of ML in precision agriculture is promising due to advancement in technologies such as UAV-mounted smart sensors, mechatronics, the IoT and AI which are expected to provide better and more precise data, improving the efficiency and accuracy of ML algorithms. In the future, ML algorithms are anticipated to be customized and tailored to specific crops, farm management practices and field conditions enabling researchers to choose the best algorithms for different study contexts and provide better predictions and insights.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This experiment for this research was mainly funded by the Geran Putra-Inisiatif Putra Siswazah under Grant GP-IPS/2018/9670300.

    The authors declare no conflict of interest.


    Acknowledgments



    This study is not funded by any agency and is being conducted by the authors independently.

    Conflict of Interest



    The authors declare no conflict of interest.

    [1] Daniel DTG, Alpert EA, Jaffe E (2022) The crowd crush at mount meron: Emergency medical services response to a silent mass casualty incident. Disaster Med Public Health Prep 1–3. https://doi.org/10.1017/dmp.2022.162
    [2] Alquthami AH, Pines JM (2014) A systematic review of noncommunicable health issues in mass gatherings. Prehosp Disaster Med 29: 167-175. https://doi.org/10.1017/S1049023X14000144
    [3] Memish ZA, Steffen R, White P, et al. (2019) Mass gatherings medicine: Public health issues arising from mass gathering religious and sporting events. Lancet 393: 2073-2084. https://doi.org/10.1016/S0140-6736(19)30501-X
    [4] World Health OrganizationPublic health for mass gatherings: Key considerations (2015). Available from: https://www.who.int/publications/i/item/public-health-for-mass-gatherings-key-considerations
    [5] de Almeida MM, von Schreeb J (2019) Human stampedes: An updated review of current literature. Prehosp Disaster Med 34: 82-88. https://doi.org/10.1017/S1049023X18001073
    [6] Hsieh YH, Ngai KM, Burkle FM, et al. (2009) Epidemiological characteristics of human stampedes. Disaster Med Public Health Prep 3: 217-223. https://doi.org/10.1097/DMP.0b013e3181c5b4ba
    [7] Ngai KM, Burkle FM, Hsu A, et al. (2009) Human stampedes: a systematic review of historical and peer-reviewed sources. Disaster Med Public Health Prep 3: 191-195. https://doi.org/10.1097/DMP.0b013e3181c5b494
    [8] Madzimbamuto F D (2003) A hospital response to a soccer stadium stampede in Zimbabwe. Emerg Med J 20: 556-559. https://doi.org/10.1136/emj.20.6.556
    [9] Burkle F M, Hsu E B (2011) Ram Janki Temple: understanding human stampedes. Lancet 377: 106-107. https://doi.org/10.1016/S0140-6736(10)60442-4
    [10] Illiyas FT, Mani SK, Pradeepkumar AP, et al. (2013) Human stampedes during religious festivals: A comparative review of mass gathering emergencies in India. Int J Disast Risk Re 5: 10-18. https://doi.org/10.1016/j.ijdrr.2013.09.003
    [11] Washington PostStampede at Indonesia soccer game kills 125, officials say (2022). Available from: https://www.washingtonpost.com/world/2022/10/01/indonesia-riot-arema-fc-liga-football/.
    [12] Doan AE, Bogen KW, Higgins E, et al. (2022) A content analysis of twitter backlash to Georgia's abortion ban. Sex Reprod Healthc 31: 100689. https://doi.org/10.1016/j.srhc.2021.100689
    [13] Sinnenberg L, Buttenheim A M, Padrez K, et al. (2017) Twitter as a tool for health research: A systematic review. Am J Public Health 107: e1-e8. https://doi.org/10.2105/AJPH.2016.303512
    [14] Amoudi G, Almansour A, Watters C, et al. (2015) Tweet for help: the role of social media in disaster events and the case of the 2015 Mina stampede. Digit Creat 33: 329-348. https://doi.org/10.1080/14626268.2022.2141262
    [15] Kryvasheyeu Y, Chen H, Moro E, et al. (2015) Performance of social network sensors during hurricane Sandy. PloS One 10: e0117288. https://doi.org/10.1371/journal.pone.0117288
    [16] Martín Y, Li Z, Cutter S L (2017) Leveraging Twitter to gauge evacuation compliance: Spatiotemporal analysis of hurricane Matthew. PloS One 12: e0181701. https://doi.org/10.1371/journal.pone.0181701
    [17] Shelton T, Poorthuis A, Graham M, et al. (2014) Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum 52: 167-179. http://dx.doi.org/10.1016/j.geoforum.2014.01.006
    [18] Xu Z, Lachlan K, Ellis L, et al. (2020) Understanding public opinion in different disaster stages: a case study of Hurricane Irma. Internet Res 30: 695-709. https://doi.org/10.1108/intr-12-2018-0517
    [19] Zou L, Lam NSN, Shams S, et al. (2019) Social and geographical disparities in Twitter use during Hurricane Harvey. Int J Digit Earth 12: 1300-1318. https://doi.org/10.1080/17538947.2018.1545878
    [20] Duan J, Zhai W, Cheng C (2020) Crowd detection in mass gatherings based on social media data: A case study of the 2014 Shanghai New Year's Eve stampede. Int J Environ Res Public Health 17: 8640. https://doi.org/10.3390/ijerph17228640
    [21] Lanier HD, Diaz MI, Saleh SN, et al. (2022) Analyzing COVID-19 disinformation on Twitter using the hashtags #scamdemic and #plandemic: Retrospective study. PloS One 17: e0268409. https://doi.org/10.1371/journal.pone.0268409
    [22] Saleh SN, Lehmann C, McDonald S, et al. (2020) Understanding public perception of COVID-19 Social distancing on Twitter. Open Forum Infect D 7: S309. https://doi.org/10.1093/ofid/ofaa439.679
    [23] Saleh SN, Lehmann CU, McDonald SA, et al. (2021) Understanding public perception of coronavirus disease 2019 (COVID-19) social distancing on Twitter. Infect Cont Hosp Ep 42: 131-138. https://doi.org/10.1017/ice.2020.406
    [24] Yi J, Pan S, Chen Q (2020) Simulation of pedestrian evacuation in stampedes based on a cellular automaton model. Simul Model Pract Th 104: 102147. https://doi.org/10.1016/j.simpat.2020.102147
    [25] Ujah OI, Olaore P, Nnorom OC, et al. (2023) Examining ethno-racial attitudes of the public in Twitter discourses related to the United States Supreme Court Dobbs vs. Jackson Women's Health Organization ruling: A machine learning approach. Frontiers Glob Women Health 4: 1149441. https://doi.org/10.3389/fgwh.2023.1149441
    [26] Aranda AM, Sele K, Etchanchu H, et al. (2021) From big data to rich theory: Integrating critical discourse analysis with structural topic modeling. Eur Manag Rev 18: 197-214. https://doi.org/10.1111/emre.12452
    [27] O'Connor Brendan, Bamman David, A Smith Noah (2018) Computational text analysis for social science: Model assumptions and complexity. Carnegie Mellon Univ. J Contrib . https://doi.org/10.1184/R1/6473291.v1
    [28] Macanovic A (2022) Text mining for social science - The state and the future of computational text analysis in sociology. Soc Sci Res 108: 102784. https://doi.org/10.1016/j.ssresearch.2022.102784
    [29] Stracqualursi L, Agati P (2022) Tweet topics and sentiments relating to distance learning among Italian Twitter users. Sci Rep 12: 9163. https://doi.org/10.1038/s41598-022-12915-w
    [30] Kryvasheyeu Y, Chen H, Obradovich N, et al. (2016) Rapid assessment of disaster damage using social media activity. Sci Advances 2: e1500779. https://doi.org/10.1126/sciadv.1500779
    [31] Perlstein SG, Verboord M (2021) Lockdowns, lethality, and laissez-faire politics. Public discourses on political authorities in high-trust countries during the COVID-19 pandemic. PloS One 16: e0253175. https://doi.org/10.1371/journal.pone.0253175
    [32] Carosia AEO, Coelho GP, Silva AEA (2020) Analyzing the Brazilian Financial Market through Portuguese Sentiment Analysis in Social Media. Appl Artif Intell 34: 1-19. https://doi.org/10.1080/08839514.2019.1673037
    [33] Fino E, Hanna-Khalil B, Griffiths MD (2021) Exploring the public's perception of gambling addiction on Twitter during the COVID-19 pandemic: Topic modelling and sentiment analysis. J Addict Dis 39: 489-503. https://doi.org/10.1080/10550887.2021.1897064
    [34] Shahin S, Ng YMM (2022) Connective action or collective inertia? Emotion, cognition, and the limits of digitally networked resistance. Soc Movement Stud 21: 530-548. https://doi.org/10.1080/14742837.2021.1928485
    [35] Lindstedt NC (2019) Structural topic modeling for social scientists: A brief case study with social movement studies literature, 2005–2017. Soc Curr 6: 307-318. https://doi.org/10.1177/2329496519846505
    [36] Roberts ME, Stewart BM, Tingley D (2019) stm: An R Package for Structural Topic Models. J Stat Softw 91: 1-40. https://doi.org/10.18637/jss.v091.i02
    [37] Ramondt S, Kerkhof P, Merz EM (2022) Blood donation narratives on social media: A topic modeling study. Transfus Med Rev 36: 58-65. https://doi.org/10.1016/j.tmrv.2021.10.001
    [38] Schwartz B, Nafziger S, Milsten A, et al. (2015) Mass gathering medical care: Resource document for the national association of EMS physicians position statement. Prehosp Emerg Care 19: 559-568. https://doi.org/10.3109/10903127.2015.1051680
    [39] Slabbert AD, Ukpere WI (2010) A preliminary comparative study of rugby and football spectators' attitudes towards violence. Afr J Bus Manag 4: 459-466. https://doi.org/10.5897/AJBM.9000022
    [40] Khan AA, Sabbagh AY, Ranse J, et al. (2021) Mass gathering medicine in soccer leagues: A review and creation of the SALEM Tool. Int J Environ Res Public Health 18: 9973. https://doi.org/10.3390/ijerph18199973
    [41] Thackway S, Churches T, Fizzell J, et al. (2009) Should cities hosting mass gatherings invest in public health surveillance and planning? Reflections from a decade of mass gatherings in Sydney, Australia. BMC Public Health 9: 324. https://doi.org/10.1186/1471-2458-9-324
    [42] Koski A, Kouvonen A, Sumanen H (2020) Preparedness for Mass Gatherings: Factors to Consider According to the Rescue Authorities. Int J Environ Res Public Health 17: 1361. https://doi.org/10.3390/ijerph17041361
    [43] Hawkins JB, Brownstein JS, Tuli G, et al. (2016) Measuring patient-perceived quality of care in US hospitals using Twitter. BMJ Qual Saf 25: 404-413. https://doi.org/10.1136/bmjqs-2015-004309
  • This article has been cited by:

    1. Shijin Yao, Bin Wang, De Li Liu, Siyi Li, Hongyan Ruan, Qiang Yu, Assessing the impact of climate variability on Australia’s sugarcane yield in 1980–2022, 2025, 164, 11610301, 127519, 10.1016/j.eja.2025.127519
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1805) PDF downloads(211) Cited by(0)

Article outline

Figures and Tables

Figures(2)  /  Tables(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog