
CNN models already play an important role in classification of crop and weed with high accuracy, more than 95% as reported in literature. However, to manually choose and fine-tune the deep learning models becomes laborious and indispensable in most traditional practices and research. Moreover, the classic objective functions are not thoroughly compatible with agricultural farming tasks as the corresponding models suffer from misclassifying crop to weed, often more likely than in other deep learning application domains. In this paper, we applied autonomous machine learning with a new objective function for crop and weed classification, achieving higher accuracy and lower crop killing rate (rate of identifying a crop as a weed). The experimental results show that our method outperforms state-of-the-art applications, for example, ResNet and VGG19.
Citation: Xuetao Jiang, Binbin Yong, Soheila Garshasbi, Jun Shen, Meiyu Jiang, Qingguo Zhou. Crop and weed classification based on AutoML[J]. Applied Computing and Intelligence, 2021, 1(1): 46-60. doi: 10.3934/aci.2021003
[1] | Youliang Zhang, Guowu Yuan, Hao Wu, Hao Zhou . MAE-GAN: a self-supervised learning-based classification model for cigarette appearance defects. Applied Computing and Intelligence, 2024, 4(2): 253-268. doi: 10.3934/aci.2024015 |
[2] | Yongcan Huang, Jidong J. Yang . Semi-supervised multiscale dual-encoding method for faulty traffic data detection. Applied Computing and Intelligence, 2022, 2(2): 99-114. doi: 10.3934/aci.2022006 |
[3] | Hao Zhen, Oscar Lares, Jeffrey Cooper Fortson, Jidong J. Yang, Wei Li, Eric Conklin . Unraveling the dynamics of single-vehicle versus multi-vehicle crashes: a comparative analysis through binary classification. Applied Computing and Intelligence, 2024, 4(2): 349-369. doi: 10.3934/aci.2024020 |
[4] | Noah Gardner, John Paul Hellenbrand, Anthony Phan, Haige Zhu, Zhiling Long, Min Wang, Clint A. Penick, Chih-Cheng Hung . Investigation of ant cuticle dataset using image texture analysis. Applied Computing and Intelligence, 2022, 2(2): 133-151. doi: 10.3934/aci.2022008 |
[5] | Guanyu Yang, Zihan Ye, Rui Zhang, Kaizhu Huang . A comprehensive survey of zero-shot image classification: methods, implementation, and fair evaluation. Applied Computing and Intelligence, 2022, 2(1): 1-31. doi: 10.3934/aci.2022001 |
[6] | Francis Nweke, Abm Adnan Azmee, Md Abdullah Al Hafiz Khan, Yong Pei, Dominic Thomas, Monica Nandan . A transformer-driven framework for multi-label behavioral health classification in police narratives. Applied Computing and Intelligence, 2024, 4(2): 234-252. doi: 10.3934/aci.2024014 |
[7] | Hao Zhen, Yucheng Shi, Jidong J. Yang, Javad Mohammadpour Vehni . Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification. Applied Computing and Intelligence, 2023, 3(1): 13-26. doi: 10.3934/aci.2023002 |
[8] | Sheyda Ghanbaralizadeh Bahnemiri, Mykola Pnomarenko, Karen Eguiazarian . Iterative transfer learning with large unlabeled datasets for no-reference image quality assessment. Applied Computing and Intelligence, 2024, 4(2): 107-124. doi: 10.3934/aci.2024007 |
[9] | Marko Niemelä, Mikaela von Bonsdorff, Sami Äyrämö, Tommi Kärkkäinen . Classification of dementia from spoken speech using feature selection and the bag of acoustic words model. Applied Computing and Intelligence, 2024, 4(1): 45-65. doi: 10.3934/aci.2024004 |
[10] | Muhammad Amir Shafiq, Zhiling Long, Haibin Di, Ghassan AlRegib . A novel attention model for salient structure detection in seismic volumes. Applied Computing and Intelligence, 2021, 1(1): 31-45. doi: 10.3934/aci.2021002 |
CNN models already play an important role in classification of crop and weed with high accuracy, more than 95% as reported in literature. However, to manually choose and fine-tune the deep learning models becomes laborious and indispensable in most traditional practices and research. Moreover, the classic objective functions are not thoroughly compatible with agricultural farming tasks as the corresponding models suffer from misclassifying crop to weed, often more likely than in other deep learning application domains. In this paper, we applied autonomous machine learning with a new objective function for crop and weed classification, achieving higher accuracy and lower crop killing rate (rate of identifying a crop as a weed). The experimental results show that our method outperforms state-of-the-art applications, for example, ResNet and VGG19.
Weeding has been a big issue faced by farmers for a long time, especially for those who work in gigantic farms. Industrialized modern agriculture applies chemical methods to control weed; however, it leads to a great increase in herbicide resistance and rising harms to the ecological environment. Therefore, more focused research is required on smart and intelligent technologies for precision weed management. Meanwhile, effective farming practices also demand precision crop cultivation to increase the overall agricultural yields. Under both the conditions mentioned herein, each kind of plant should be treated with different strategies. As it is time consuming and laborious to conduct this process manually, a great deal of effort has been put into facilitating autonomous farming in which various computational intelligence models are designed to classify field plants.
Implementation of the autonomous farming has been a research focus in modern agriculture, especially for the emerging Agriculture-4.0 [1]; however, plant classification is still a challenging research issue in this realm [2,3]. Many studies on weed and crop classification have been carried out over the past few decades to meet the needs for precision weed management. Generally, these studies can be divided into three categories: 3D point cloud classification, spectrum classification and image classification. 3D point cloud classification relies on intensive computing to determine the label of each entity, which requires 3D Li-DAR data [4] with their bounding boxes. For the spectrum classification, monochrome cameras with different lasers were used to match the spectral reflectance for classification [5]. In image classification [6], cameras were used to capture images of the field. 3D features and spectrum features could be appropriate factors for weed classification, but high-priced instrumentation and computing devices would be required, not suitable for every farmer.
Utilizing images for crop and weed classification can be implemented by traditional algorithm and convolution neural network (CNN). In many image classification tasks, CNN has higher accuracy than traditional algorithm like KNN, SVM and MLP [7], which also produce remarkable results in crop and weed classification. However, there is no published study focusing on applying auto machine learning (AutoML) [8] on crop and weed classification, nor on training models with high accuracy and low crop killing rate (CKR, rate of identifying a crop as a weed).
This paper presents a method of crop and weed classification based on AutoML and ensemble modeling, in order to achieve better performance on the outdoor greenhouse data set without manual model selection. Compared to existing methods, it consists of four improvements:
● Using AutoML to select the optimal model automatically;
● New metric in evaluation and new objective function in model training;
● New algorithm for the compatible model among different data sets;
● Ensemble strategy to reaching high accuracy and low CKR.
The pipeline of this project is shown in Fig. 1. First, image data are collected and preprocessed by a robot in greenhouse. Then, the processed images are uploaded to a data server, and labeled manually. The labeled data is sent to GPU server for models training. Finally, the trained models are gathered and deployed on the robot to perform detection tasks. The whole system is automatic except for data labeling.
As presented in Fig. 2, our method includes four steps: data acquisition, image pre-processing, model searching and training, ensemble modeling. We build a new data set in two steps: data acquisition and image pre-processing. For model searching and training, we apply AutoML based algorithm and new objective function. At the end, we use ensemble model to make final prediction.
In this work, our first step is to construct the data set. Since there is no farmland near our university campus, we built a wooden cuboid box with board, screws, waterproof membrane and used it to plant potatoes and two kinds of weeds, which is shown in Fig. 2 S1. A four wheeled robot is built to collect data, and its hardware system includes raspberry Pi, USB camera, motor and control chip. The robot can traverse in the field step by step, while the slider moves the camera horizontally. Equipped with a kit of LED lights, the robot can also perform satisfactorily on an overcast day. It collects raw data from the field, and subsequently we pre-process the data at the next step.
We use Algorithm 1 to extract crops and weeds images from field images, then we label them manually to construct the data set. In Algorithm 1, functions rgb2hsv, mask, morphologyOpen and findCounters are based on OpenCV [9], and the results of key procedures are shown in Fig. 2 S2.
Algorithm 1 Field image segmentation |
Input: img, Tsize, Tratio
Output: Seg={Seg1, Seg2, · · ·, Segm}; 1: hsv_img = rgb2hsv(img) 2: masked_img = mask(hsv_img) 3: opened_img = morphologyOpen(masked_img) 4: counters = findCounters(opened_img) 5: Seg = [] 6: for counter ∈ counters do 7: obj_img = boundingBoxCut (img, counter) 8: if size(obj_img) / size(img) < Tsize then 9: continue 10: end if 11: if ratio(obj_img) < Tratio then 12: continue 13: end if 14: sharpen_img = sharpen(obj_img) 15: out_img = bitwiseAnd(sharpen_img, counter) 16: Seg.append(out_img) 17: end for 18: return Seg |
As presented in Algorithm 1, the algorithm takes original field image img, threshold of image size as Tsize, threshold of image shape's ratio as Tratio (e.g., for an image of shape (200,100, 1) or (100,200, 1), its ratio is 0.5). From line 1 to 2, we convert RGB to HSV and mask pixels with Equation (1) listed below, which removes the background of image as S2.b of Fig. 2. In line 3, we use morphology open method to remove noise in mask as S2.c of Fig. 2. From line 8 to 13, we separate the crops and weeds in images and skip the small/banded objects as S2.d of Fig. 2. From line 14 to 15, we highlight the counter of sharpen images as S2.g of Fig. 2. Then we label these images manually to establish our data set.
mask(Pixel(i,j))=Pixel(i,j){H(i,j)∈(45,95]S(i,j)∈(55,255]V(i,j)∈(55,255] | (1) |
For Equation (1), we cluster images according to luminous intensity provided by the light sensor, then analyze the histogram of each cluster to determine the range of HSV.
The flow chart of model searching and training is shown in Fig. 2 S3, and the key parts are marked with different colors, which will be detailed in the next three subsections.
The model search process employs AutoML-based algorithm to generate several models on two data sets, and we apply new objective functions on models training. In the first subsection, we will introduce two data sets used in this paper and the sampling rule. Then, we elaborate the AutoML-based algorithm in the second subsection. For the model training section, we detail serval objective functions and apply them on models training.
The proposed methodology of crop and weed classification is carried out on two data sets: DS.1, a new data set collected in a small field by an agricultural robot, with 2068 images of potatoes and weeds, and in addition to DS.2, an open source big data set of crops called PlantVillage2 [10], with12,752 images of different crops, was also used. Although there are no weeds in DS.2, we can manually choose some crops as weeds, given that they do not affect the performance of models.
With a motivation to simulate an environment of non-chemical weeding, we do not apply weed control during the data collection, i.e. there is no plastic mulch or herbicide, which might lead to more weed than crop in biomass. After image pre-processing and manual labeling, we capture data set DS.1 with weed to crop ratio of approximately 2:1, as indicated in Fig. 3. An imbalanced data set affects the prediction of models, as a model tends to be misled by classifying an object as a weed to get higher score. Therefore, we use sampling approaches to produce the weed sub sets from the original complete data set (CD). For each class of weed in the data set, we calculate sample rate k as Equation (2), and concreate sampled weed part and crop part to get a sampled data set (SD) with a ratio of weed to crop approximately 1:1. In this paper, both DS.1 and DS.2 are applied with CD and SD to test the feasibility and robustness of our methodology.
k=α⋅Numweed_iNumweed+β⋅1Clsweed | (2) |
where Numweed is the volume of weed data, Numweed_i is the volume of specific weed, Clsweed is the classes of weed, α and β are parameters set to 0.7 and 0.3.
We use AutoKeras [11] as the AutoML framework for autonomous computing instead of testing models manually. AutoKeras is an open source package widely used for image and text classification. Deep models are feasible on powerful GPU in laboratory [12], but not on single board computer in farm robot. Hence, we restricted the model in AutoML to 300,000 parameters.
Given a type of models, AutoML might yield totally different results for different data set, that would cause inconsistency in the next step. We hence proposed Algorithm 2 to make the searching procedure feasible in multiple data sets, which allowed us to get similar model structure besides the output layer. For models of the same type, there is only a slight structure divergence in the last two layers, as different data sets have unequal classes.
Algorithm 2 Optimal models in multiple data set |
Input: T={T1, T2, · · ·, Tm}, DS={DS.1, DS.2, · · ·, DS.n} Output: M={M1, M2, · · ·, Mm}; 1: M = Ø 2: for t ∈ T do 3: for ds ∈ DS do 4: trailsdst= search (t, ds) 5: end for 6: Common = trailsDS.1t∩trailsDS.2t∩· · ·∩trailsDS.nt 7: if Common ≠ Ø then 8: bestt = max(common) 9: else 10: score_table = Ø 11: for ds ∈ DS do 12: score_table += evaluate(trailsdst, DS) 13: end for 14: bestt = max(score_table) 15: end if 16: Mt = bestt 17: end for 18: return M |
As shown in Algorithm 2, the algorithm takes AutoML model type list T, data set list DS as the input. It outputs a list of optimal models M in the given data sets and model types. In line 4, search function is used for searching models, which requires type of model t, current data set ds. Return of search function is an optimal CNN model derived from AutoKeras. In line 6, the operator of set intersection ∩ is used to get common part of two model list.
The training process of a CNN is an optimization problem in which various functions are applied to measure the distance between the true value vector y and the predicted value vector ˆy. For classification tasks, Categorical Cross Entropy (CCE) is the simplest and most commonly used objective function, shown in Equation (6). For each element in true value y and output ˆy, CCE yield 1 if they are the same, otherwise, it is 0.
y=[y1,y2⋯yn]T,ˆy=[ˆy1,ˆy2⋯ˆyn]T | (3) |
equal∗(a,b)={1a−b=00a−b≠0 | (4) |
equal([a1,a2,⋯an]T,[b1,b2,⋯bn]T)=[equal∗(a1,b2),equal∗(a2,b2)⋯equal∗(an,bn)]T | (5) |
CCE=equal(y,ˆy) | (6) |
Although CCE is widely used, it still has some shortcomings in handling misclassification of farming tasks, because CCE leads the model to predict accurately based on a unified rule. In the real framing tasks, the cost of misclassification will depend on the pragmatic situation; i.e., the cost of classifying weed to crop is low while it's bearable to leave a few weeds; however, the cost of classifying crop to weed tends to be high, while it's disadvantageous to remove any part of authentic crops.
In order to lower the risk of killing the crops, new objective functions are used in our training procedure: no miss weed (NMW) in Equation (11) and dual metrics (DM) in Equation (12). In Equation (7): w stands for weed in the data set, p for kinds of weed, q for kinds of crop in the data set, u for the unlabeled object, which does not exist in the original set and has been added for new objective function, and p, q, r stand for the volume of their sets. In Equation (8), the result of contain() function is a vector if the target vector tar contains any elements in template vector temp, e.g., the value of tar is assumed to be [1,0,2,1] for model's answer, the value of temp is assumed to be [0, 2] for valid answer, so the result of contain() is [False, True, True, False].
w=[w1,w2⋯wp]T,c=[c1,c2⋯cq]T,u=[u1,u2⋯ur]T, | (7) |
contain(temp,tar)=⋁temptempi[tar∧tempi⋅1] | (8) |
The objective function NMW is comprised of CCE, tolerance of homogeneity Tolhomo in Equation (9) and tolerance of unknown Tolunknow in Equation (10). Tolhomo yield 1 when predicted value and its true value are in the same group, and Tolunknown yield 1 when predict as unknown. These two objective functions disclose some latent information to models, or some tolerance when models predict with slight errors. Then we apply a logical OR operator between CCE and tolerance. In short, NMW returns 0 if and only if the object belonging to crop is predicated as a weed.
Tolhomo=⋁[w,c]a[contain(a,y)∧contain(a,ˆy)] | (9) |
Tolunknow=contain(u,y) | (10) |
NMW=Acc∨(Tolhomo∧Tolunknow) | (11) |
DM=[Acc,NMW] | (12) |
DM is a vector of objective functions instead of a single objective function like CCE and NMW. In model training, if the model's optimizer accepts multiple objective functions, the model will optimize them separately instead of optimizing the corresponding sum. After randomizing weights, we trained these models with these two new objective functions: NMW and DM.
The strategy of ensemble modeling is shown in Fig. 2 S4, where models A, B and C have been trained in the same configuration; P1, P2 and P3 as predictions of the models, and the corresponding order is arbitrary; in Pred(m) block, cate is category of prediction, like potato, tomato or apple; type is the group of this prediction, crop, weed or unknown; and Act(m) shows how the model suggest to deal with these objects. Some key points of the ensemble modeling are explained as follows.
1) If all models reach a consensus, in the sense that they give the same prediction in a specific category, then give the final prediction. 2) If more than half of the models reach a consensus on identifying as a crop, e.g. [crop1, crop1, weed1] of three models, we mark this object as crop1 if there still have a gap between classified crop1 and the total crop1, and 3) Otherwise we merely mark it as unknown.
When dealing with consensus and disagreement, our method is the same as other methods of ensemble models, but we have different treatments in processing crop-like (Case 2 above) and the other predictions, because treating a crop as a weed, which causes crop killing, will reduce the yield of field production. For Case 2, total amount of crops was employed in ensemble strategy since it was easy to obtain in field.
Performance metrics used in this paper include Accuracy and Recallcrop, which are defined in Equations (13) and (15).
Accuracy=1|w+c|⋅|equal(y,ˆy)| | (13) |
CKR=1|c|⋅|∼equal(y⋅contain(c,y),ˆy⋅contain(w,ˆy))| | (14) |
Recallcrop=1–CKR | (15) |
where y denotes the true value, and ˆy means the prediction value. Besides, function equal, function contain, weed data w, and crop data c are defined in Equations (5), (7) and (8) earlier.
In crop and weed classification, Accuracy is the number of correctly predicted plants out of all the plants. For CKR, its numerator is the number of crops identified as weed, and its denominator is the number of all crops. Recallcrop is the detected crops out of all crops, which equals to 1 – CKR. Note that identifying a crop to other kind of crop is a valid detection in the context of Recallcrop.
For brevity of description, we name the generated models by their certain characteristics and structures. Vanilla neural network, dubbed Vanilla, has similar structure as an earlier neural network named AlexNet [13]. Convolution neural network, dubbed Conv, has a similar structure as a famous convolution neural networks named VGG [14]. Dilated convolution neural network, dubbed Dilated, has a similar structure as Conv but a new type of dilated convolution block [15,16].
To evaluate these three generated models by AutoML, five state of the art CNN models are used as baseline models for comparison: DenseNet201 [17], InceptionV3 [18], VGG19 [14], Xception [19] and Res-Net152v2 [20]. These five models have achieved the top ranking in the ImageNet competition [21]. Their number of parameters range from twenty million to one hundred million, at least 100 times of our generated model parameters. All the baseline models are denoted with * in the last five rows of Table 1.
Model | DS.1 | DS.2 | ||||
Accuracy | Recallcrop | Time | Accuracy | Recallcrop | Time | |
Vanilla | 98.06% | 93.36% | 12.4h | 99.24% | 97.76% | 15.5h |
Dilated | 98.34% | 94.24% | 9.5h | 99.53% | 98.64% | 11.3h |
Conv | 97.51% | 93.72% | 12.4h | 99.06% | 96.88% | 14.2h |
*DenseNet201 [17] | 98.26% | 93.88% | 0.6h | 98.95% | 96.93% | 0.5h |
*InceptionV3 [18] | 98.96% | 94.93% | 0.6h | 99.51% | 97.49% | 0.6h |
*VGG19 [14] | 98.58% | 94.15% | 0.8h | 99.20% | 96.59% | 0.8h |
*Xception [19] | 97.16% | 92.65% | 0.7h | 99.49% | 97.77% | 0.6h |
*ResNet152v2 [20] | 99.52% | 98.44% | 1.0h | 99.58% | 98.72% | 1.0h |
Because crops are the goal of agricultural production, the mis-identification of crops as weeds can be costlier than other cases. Accuracy measures only overall performance on crop and weed, so we use Recallcrop to measure the effect on crop classification. For example, suppose a dataset has two classes dubbed as CROP and WEED containing 7 weeds and 3 crops. If a crop and a weed were identified incorrectly, then we would get 80% Accuracy and 60% Recallcrop according to (13) and (15). The training time of the baseline model and the search time of the generated model are also considered. Details of the evaluation are shown in Table 1.
The above models achieve high Accuracy and Recallcrop at most cases. For the training time on GTX 2080Ti GPU, it took more than 10 hours to generate the model, while the baseline model only took less than 1 hour. Such time consumption is acceptable if no human intervention is required.
However, the Recallcrop score is always lower than the Accuracy score, which may be caused by prediction bias in imbalanced data sets. Low Recallcrop is an unpromising result for crop and weed classification. In addition, CCE is used to determine the specific type of target, which assumes that all misclassification errors made by a model are equal. Thus, we tried to use sampling approaches to treat the imbalanced data set, and new objective functions to fix the training problem regarding CCE.
According to Accuracy and Recallcrop values in Table 1, two scatter diagrams are plotted to further analyze the performances of these models in Fig. 5. We can see that ResNet achieves the best performance, hence we choose ResNet as our baseline models in the subsequent experiments. Limited to parameter size, the performances of the generated models are similar to most baseline models.
The training curves of the models and the corresponding final scores are shown in Fig. 6. For each data set, DS.1 and DS.2, model training is carried out based on sampled data set and complete data set. For each data usage method, the models are trained considering three objective functions i.e., CCE, NMW and DM. As CCE and NMW are combined in DM, it has two separate plots which are DM-CCE and DM-NMW.
In general, the curves become stable after about 60 epochs, reaching high scores of more than 96% after 64 epochs. The curves of ResNet have more fluctuations than others, which might be due to the corresponding deep and special structures. The final scores of NMW are slightly higher than scores of CCE, while DM-CCE scores slightly higher than CCE scores. Based on the training curves and scores, the models are sophisticated enough to be applied for the next task. So, we use the combined model to determine the appropriate configuration, in other words, the one with high Accuracy and low CKR.
The ensemble strategy shown in Section 2.3 is used to build the ensemble model from three generated models. Since there are two data usages (CD and SD) and three objective functions (CCE, NMW and DM), we have six configurations on each data set, as shown in the first column of Table 2. Evaluations of ensemble models and the corresponding sub-models on the test data sets are shown in Table 2 and Fig. 6.
Experiments | DS.1 | DS.1 | ||
The best | Ensemble | The best | Ensemble | |
CD-CCE | 99.58% | 99.27% | 99.53% | 99.02% |
CD-NMW | 99.03% | 99.76% | 99.41% | 99.64% |
CD-DM | 99.58% | 99.76% | 99.48% | 99.84% |
SD-CCE | 99.24% | 99.52% | 99.11% | 99.92% |
SD-NMW | 99.27% | 99.76% | 99.56% | 99.88% |
SD-DM | 99.24% | 99.76% | 99.87% | 99.96% |
In the evaluation, the scores of the sub-models are no longer displayed separately except for the highest and lowest ones in the sub-models. In Table 2, 'The Best' indicated the highest Accuracy among the sub-models, which are slightly better than the ensemble model in most cases.
Besides achieving high Accuracy, analyzing the misclassification to reduce CKR is the main goal of this project, hence 0% CKR means that no crop is misclassified as weed, so no crop will be wrongly killed. To measure the misclassification more accurately, we divide them into four categories as follows:
1. Moderate errors: they occur when the model classifies an object as unknown, which occur in new objective functions.
2. Minor errors: they occur when the model's prediction is inconsistent with the corresponding label, but both are in the same type, for instance the prediction is weed1, but the label is weed2;
3. Considerable errors: they occur when the models predict a weed as a crop;
4. Dangerous errors: they occur when the models predict a crop as a weed.
These errors are ranked based on their outcomes in practice. For moderate errors, we have a chance to rectify them manually. While minor errors do little (or nil) harm to the field, considerable errors leave a weed in field and dangerous ones kill a crop mistakenly.
By dividing the misclassification, we find that CKR only depends on the amount of dangerous errors. For a specific crop named A, the misclassification types would include classifying A as a weed (dangerous error), classifying A as another crop (minor error), and classifying A as an unknown object (moderate error). Thence, the total CKR will decrease if the dangerous errors can be reduced.
The percentages of four errors are shown in Fig. 7. In the analysis of errors' percentage, the worst means the highest scores of four errors, in other words, the four scores may come from different sub-models. Accordingly, the best means the lowest scores herein.
For sub-models, the models with CCE reach the highest dangerous error rates, and the models with DM reach the lowest dangerous error rates. For data usage, sampling approaches did not really achieve an obvious reduction in error rates. In most cases, the ensemble model reaches the lowest or even 0.00% dangerous error rate, and the highest moderate error rate. In summary, our proposed ensemble strategy and the corresponding objective function can reduce the CKR by lowering the rate of dangerous errors.
This work proposes a methodology of crop and weed classification based on AutoML and ensemble modeling. AutoML-based algorithm helps us to automatically choose the CNN models among two data sets. Models with different data usages and different objective function are used to build ensemble models. Overall, the ensemble model with objective function DM achieves highest Accuracy and lowest CKR. Thus, we hypothesize that by applying this method, we can effectively move towards desired outcomes in precision farming.
While we made contributions as mentioned above, due to the environmental constraints, our method was only evaluated in green house, and performance in other environments still needs to be tested. We hope to tackle the limitation in near future experiments with input of more generic data sets.
This work was partially supported by National Key R & D Program of China under Grant No. 2020YFC0832500, Ministry of Education - China Mobile Research Foundation under Grant No. MCM20170206, The Fundamental Research Funds for the Central Universities under Grant No. lzujbky-2021-sp47, lzujbky-2020-sp02, lzujbky-2019-kb51 and lzujbky-2018-k12, National Natural Science Foundation of China under Grant No. 61402210, State Grid Corporation of China Science and Technology Project under Grant No. SGGSKY00WYJS2000062, Science and Technology Plan of Qinghai Province under Grant No.2020-GX-164, Google Research Awards and Google Faculty Award. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Jetson TX1 used for this research.
A request to access data can be directed to authors. The research performed in this work is the sole work of the named authors. The ideas presented in this article do not pose any risks to individuals or institution. We declare that we do not have any conflicts of interest regarding the study.
[1] |
D. C. Rose, R. Wheeler, M. Winter, M. Lobley, C. A. Chivers, Agriculture 4.0: Making it work for people, production, and the planet, Land Use Policy, 100 (2021), 104933. doi: 10.1016/j.landusepol.2020.104933. doi: 10.1016/j.landusepol.2020.104933
![]() |
[2] |
A. Wang, W. Zhang, X. Wei, A review on weed detection using ground-based machine vision and image processing techniques, Comput. Electron. Agr., 158 (2019), 226-240. doi: 10.1016/j.compag.2019.02.005. doi: 10.1016/j.compag.2019.02.005
![]() |
[3] |
J. Futoma, J. Morris, J. Lucas, A comparison of models for predicting early hospital readmissions, J. Biomed. Inform., 56 (2015), 229-238. doi: 10.1016/j.jbi.2015.05.016. doi: 10.1016/j.jbi.2015.05.016
![]() |
[4] |
F. B. P. Malavazi, R. Guyonneau, J. B. Fasquel, S. Lagrange, F. Mercier, Lidar-only based navigation algorithm for an autonomous agricultural robot, Comput. Electron. Agr., 154 (2018), 71-79. doi: 10.1016/j.compag.2018.08.034. doi: 10.1016/j.compag.2018.08.034
![]() |
[5] |
W. Strothmann, A. Ruckelshausen, J. Hertzberg, C. Scholz, F. Langsenkamp, Plant classification with in-field-labeling for crop/weed discrimination using spectral features and 3d surface features from a multi-wavelength laser line profile system, Comput. Electron. Agr., 134 (2017), 79-93. doi: 10.1016/j.compag.2017.01.003. doi: 10.1016/j.compag.2017.01.003
![]() |
[6] |
D. Hall, F. Dayoub, T. Perez, C. McCool, A rapidly deployable classification system using visual data for the application of precision weed management, Comput. Electron. Agr., 148 (2018), 107-120. doi: 10.1016/j.compag.2018.02.023. doi: 10.1016/j.compag.2018.02.023
![]() |
[7] |
D. I. Patrício, R. Rafael, Computer vision and artificial intelligence in precision agriculture for grain crops: A systematic review, Comput. Electron. Agr., 153 (2018), 69-81. doi: 10.1016/j.compag.2018.08.001. doi: 10.1016/j.compag.2018.08.001
![]() |
[8] | C. Thornton, F. Hutter, H. H. Hoos, K. Leyton-Brown, Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms, 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (2013), 847-855. |
[9] | G. Bradski, The OpenCV Library, Dr. Dobb's Journal of Software Tools, 2000. |
[10] | A. Ali, Plantvillage data set, 2019. data retrieved from kaggle. Available from: https://www.kaggle.com/abdallahalidev/plantvillage-dataset. |
[11] | H. Jin, Q. Song, X. Hu, Auto-keras: An efficient neural architecture search system, 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2019), 1946-1956. |
[12] |
A. Kamilaris, Prenafeta-Boldú FX, Deep learning in agriculture: A survey, Comput. Electron. Agr., 147 (2018), 70-90. doi: 10.1016/j.compag.2018.02.016. doi: 10.1016/j.compag.2018.02.016
![]() |
[13] |
A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, 25 (2012), 1097-1105. doi: 10.1145/3065386. doi: 10.1145/3065386
![]() |
[14] | K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556. |
[15] | Yu F, Koltun V, Multi-scale context aggregation by dilated convolutions, arXiv preprint, 2015, arXiv: 1511.07122. |
[16] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770-778. |
[17] | G. Huang, Z. Liu, L. V. D. Maaten, K. Q. Weinberger, Densely connected convolutional networks, IEEE conference on computer vision and pattern recognition, (2017), 4700-4708. |
[18] | C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, IEEE conference on computer vision and pattern recognition, (2016), 2818-2826. |
[19] | F. Chollet, Xception: Deep learning with depthwise separable convolutions, IEEE conference on computer vision and pattern recognition, (2017), 1251-1258. |
[20] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, IEEE conference on computer vision and pattern recognition, (2016), 770-778. |
[21] | J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. Fei-Fei, ImageNet: A large-scale hierarchical image database, IEEE Conference on Computer Vision and Pattern Recognition, (2009), 248-255. |
1. | Soumik Sinha, Sayandeep Saha, Ayantika Chatterjee, Debdeep Mukhopadhyay, 2024, FHEDGE: Encrypted Inference on Lightweight Edge Devices, 979-8-3503-7877-1, 452, 10.1109/APCCAS62602.2024.10808548 | |
2. | Bikramaditya Panda, Manoj Kumar Mishra, Bhabani Shankar Prasad Mishra, Abhinandan Kumar Tiwari, An extensive review on crop/weed classification models, 2023, 21, 2405-6456, 473, 10.3233/WEB-220115 |
Algorithm 1 Field image segmentation |
Input: img, Tsize, Tratio
Output: Seg={Seg1, Seg2, · · ·, Segm}; 1: hsv_img = rgb2hsv(img) 2: masked_img = mask(hsv_img) 3: opened_img = morphologyOpen(masked_img) 4: counters = findCounters(opened_img) 5: Seg = [] 6: for counter ∈ counters do 7: obj_img = boundingBoxCut (img, counter) 8: if size(obj_img) / size(img) < Tsize then 9: continue 10: end if 11: if ratio(obj_img) < Tratio then 12: continue 13: end if 14: sharpen_img = sharpen(obj_img) 15: out_img = bitwiseAnd(sharpen_img, counter) 16: Seg.append(out_img) 17: end for 18: return Seg |
Algorithm 2 Optimal models in multiple data set |
Input: T={T1, T2, · · ·, Tm}, DS={DS.1, DS.2, · · ·, DS.n} Output: M={M1, M2, · · ·, Mm}; 1: M = Ø 2: for t ∈ T do 3: for ds ∈ DS do 4: trailsdst= search (t, ds) 5: end for 6: Common = trailsDS.1t∩trailsDS.2t∩· · ·∩trailsDS.nt 7: if Common ≠ Ø then 8: bestt = max(common) 9: else 10: score_table = Ø 11: for ds ∈ DS do 12: score_table += evaluate(trailsdst, DS) 13: end for 14: bestt = max(score_table) 15: end if 16: Mt = bestt 17: end for 18: return M |
Model | DS.1 | DS.2 | ||||
Accuracy | Recallcrop | Time | Accuracy | Recallcrop | Time | |
Vanilla | 98.06% | 93.36% | 12.4h | 99.24% | 97.76% | 15.5h |
Dilated | 98.34% | 94.24% | 9.5h | 99.53% | 98.64% | 11.3h |
Conv | 97.51% | 93.72% | 12.4h | 99.06% | 96.88% | 14.2h |
*DenseNet201 [17] | 98.26% | 93.88% | 0.6h | 98.95% | 96.93% | 0.5h |
*InceptionV3 [18] | 98.96% | 94.93% | 0.6h | 99.51% | 97.49% | 0.6h |
*VGG19 [14] | 98.58% | 94.15% | 0.8h | 99.20% | 96.59% | 0.8h |
*Xception [19] | 97.16% | 92.65% | 0.7h | 99.49% | 97.77% | 0.6h |
*ResNet152v2 [20] | 99.52% | 98.44% | 1.0h | 99.58% | 98.72% | 1.0h |
Experiments | DS.1 | DS.1 | ||
The best | Ensemble | The best | Ensemble | |
CD-CCE | 99.58% | 99.27% | 99.53% | 99.02% |
CD-NMW | 99.03% | 99.76% | 99.41% | 99.64% |
CD-DM | 99.58% | 99.76% | 99.48% | 99.84% |
SD-CCE | 99.24% | 99.52% | 99.11% | 99.92% |
SD-NMW | 99.27% | 99.76% | 99.56% | 99.88% |
SD-DM | 99.24% | 99.76% | 99.87% | 99.96% |
Algorithm 1 Field image segmentation |
Input: img, Tsize, Tratio
Output: Seg={Seg1, Seg2, · · ·, Segm}; 1: hsv_img = rgb2hsv(img) 2: masked_img = mask(hsv_img) 3: opened_img = morphologyOpen(masked_img) 4: counters = findCounters(opened_img) 5: Seg = [] 6: for counter ∈ counters do 7: obj_img = boundingBoxCut (img, counter) 8: if size(obj_img) / size(img) < Tsize then 9: continue 10: end if 11: if ratio(obj_img) < Tratio then 12: continue 13: end if 14: sharpen_img = sharpen(obj_img) 15: out_img = bitwiseAnd(sharpen_img, counter) 16: Seg.append(out_img) 17: end for 18: return Seg |
Algorithm 2 Optimal models in multiple data set |
Input: T={T1, T2, · · ·, Tm}, DS={DS.1, DS.2, · · ·, DS.n} Output: M={M1, M2, · · ·, Mm}; 1: M = Ø 2: for t ∈ T do 3: for ds ∈ DS do 4: trailsdst= search (t, ds) 5: end for 6: Common = trailsDS.1t∩trailsDS.2t∩· · ·∩trailsDS.nt 7: if Common ≠ Ø then 8: bestt = max(common) 9: else 10: score_table = Ø 11: for ds ∈ DS do 12: score_table += evaluate(trailsdst, DS) 13: end for 14: bestt = max(score_table) 15: end if 16: Mt = bestt 17: end for 18: return M |
Model | DS.1 | DS.2 | ||||
Accuracy | Recallcrop | Time | Accuracy | Recallcrop | Time | |
Vanilla | 98.06% | 93.36% | 12.4h | 99.24% | 97.76% | 15.5h |
Dilated | 98.34% | 94.24% | 9.5h | 99.53% | 98.64% | 11.3h |
Conv | 97.51% | 93.72% | 12.4h | 99.06% | 96.88% | 14.2h |
*DenseNet201 [17] | 98.26% | 93.88% | 0.6h | 98.95% | 96.93% | 0.5h |
*InceptionV3 [18] | 98.96% | 94.93% | 0.6h | 99.51% | 97.49% | 0.6h |
*VGG19 [14] | 98.58% | 94.15% | 0.8h | 99.20% | 96.59% | 0.8h |
*Xception [19] | 97.16% | 92.65% | 0.7h | 99.49% | 97.77% | 0.6h |
*ResNet152v2 [20] | 99.52% | 98.44% | 1.0h | 99.58% | 98.72% | 1.0h |
Experiments | DS.1 | DS.1 | ||
The best | Ensemble | The best | Ensemble | |
CD-CCE | 99.58% | 99.27% | 99.53% | 99.02% |
CD-NMW | 99.03% | 99.76% | 99.41% | 99.64% |
CD-DM | 99.58% | 99.76% | 99.48% | 99.84% |
SD-CCE | 99.24% | 99.52% | 99.11% | 99.92% |
SD-NMW | 99.27% | 99.76% | 99.56% | 99.88% |
SD-DM | 99.24% | 99.76% | 99.87% | 99.96% |