1.
Introduction
Heat supply is an important issue related to the national economy and people's livelihood, and the accuracy of heat energy consumption statistics is more related to the income of heating companies [1]. In order to improve the accuracy of heat energy consumption statistics, smart heat meters have been widely used and play a very important role between users and heating companies.
Because the design and use of intelligent heat meters involve multiple functions such as heat conversion, metering, communication, etc., there are also many possible types of failures [2,3]. If the fault type of the intelligent heat meter cannot be identified quickly and accurately, it cannot be repaired in time, which will not only cause economic losses to the heating company, but also lead to conflicts between the heating company and users. Therefore, fast and accurate fault diagnosis of intelligent heat meters is of great significance in the field of thermal energy.
At home and abroad, extensive and in-depth research has been carried out on the fault diagnosis of various intelligent instruments, including heat meters. Machine learning concepts and common methods have been widely used. Koziy believes that the fault diagnosis of intelligent instruments can be made by statistically classifying the common fault types of intelligent instruments, establishing a fault database, and comparing the unknown faults with the fault types in the database [4]. Namaki uses Bayesian model to predict the fault type of intelligent instrument, but this method relies on expert experience, and the conditional probability of each node in the Bayesian model is difficult to calculate [5]. Lyashevska believes that the primary problem of fault diagnosis is to solve the problem of data imbalance, and accordingly he proposes an intelligent classification method based on CNN network [6]. Yagi uses the random forest model to build the fault diagnosis method of intelligent instrument respectively, obtaining high recognition accuracy, but there are few fault types that can be identified [7]. In order to further improve the accurate judgment of intelligent instrument fault types, a new idea of multi method fusion began to emerge [8,9,10]. Sonne fused the random forest model and support vector machine algorithm, and the new fusion method greatly improved the accuracy of fault identification of intelligent instruments [11]. Yy adopted the fusion method of Bayesian model and binary tree model to improve the accuracy of fault identification and expand the capacity of fault type library [12]. Zhou analyzed and modeled the possibility of fault diagnosis from the perspective of cognitive uncertainty and random uncertainty, and used the deep learning network based on probabilistic Bayesian estimation for fault diagnosis, which achieved good results [13]. Gompel diagnosed six types of faults in the photovoltaic system, extracted fault features according to the principle of machine learning, and then incorporated them into the recurrent neural network for analysis and decision-making, achieving a high accuracy of fault identification [14].
According to the analysis of existing research on fault diagnosis, uncertainty and randomness are the difficulties of fault diagnosis, and faults caused by different reasons will also bring large differences between fault data. For this reason, this paper first extracts the features of the fault data of the heat meter, and then achieves the fault data balance through the interpolation algorithm to avoid the impact of the difference of fault data on the subsequent fault diagnosis. Secondly, for a variety of possible faults, corresponding classifiers are established, and then the rationality of different classifier combinations is determined by voting mechanism. Finally, the weight of the classifier determined to be retained is set according to the pigeon swarm algorithm to enhance the rationality and accuracy of the fault diagnosis results. In this paper, a multi classifier fusion model based on voting mechanism and pigeon swarm algorithm is established to achieve more accurate and efficient fault diagnosis of heat meters.
2.
Fault data analysis and balance processing of heat meter
2.1. Fault data analysis of heat meter
In the heating system, the intelligent heat meter transmits data back to the heat energy control system through remote means. When the heat meter fails, it mainly shows data error or abnormality. From the perspective of fault data, the common faults of heat meters include the following types: faults caused by electrical hardware, faults caused by mechanical hardware, faults caused by communication problems, faults caused by storage problems, faults caused by billing software, faults caused by metering software, and faults caused by other reasons.
According to the statistics in the industry, the percentage of these seven types of failures is shown in Table 1.
It can be seen from the data in Table 1 that among the reasons for the failure of the heat meter, the failure caused by communication problems accounts for the highest proportion, reaching 33.6%; The faults caused by metering and billing software ranked second and fourth respectively, accounting for 17% and 13.2% respectively, which are also the faults most likely to cause disputes between users and thermal energy companies; The failure caused by other reasons accounted for 16.4%, which also shows the complexity of the failure causes of the heat meter.
2.2. Balance processing of fault data of heat meter
In order to achieve the accurate diagnosis of the heat meter fault, it is necessary to train the diagnosis model according to the heat meter fault data. However, due to the different probabilities of the seven types of faults, the amount of data for some fault types is too small, resulting in data imbalance between the types of faults with high incidence. Therefore, interpolation algorithm is used to increase the balance between different types of fault data.
The principle of data balance algorithm based on interpolation is shown in Figure 1.
As shown in Figure 1, the implementation process of the interpolation based data balancing algorithm is as follows:
Firstly, the set of a few data is set as X, and there are n samples in this set.
Secondly, sample xi (i∈[1, n]) is selected from a small number of data sets as the root sample for synthesizing new samples.
Thirdly, according to the upward sampling magnification n, select an odd number k (for example, k = 3), and use k neighborhood samples xij as auxiliary samples for synthesizing new samples, where xij∈X, j = 1, 2, ..., k.
Finally, a new sample is generated by interpolation between the root sample xi and the auxiliary sample xij. The execution formula of interpolation processing is as follows:
Here, xnew represents a new sample, γ Represents a random number that takes a value on the 0–1 interval.
3.
Fault diagnosis method of heat meter based on voting mechanism and pigeon swarm algorithm
3.1. Integrated learning based on voting mechanism
The intelligent diagnosis of heat meter faults is generally solved from the perspective of machine learning. In order to improve the accuracy of fault diagnosis, integrated learning is a better choice. The basic principle of ensemble learning is to fuse multiple classifiers to obtain a better ensemble model than a single classifier.
The integrated learning framework designed in this paper is based on the voting mechanism. The training set is trained through multiple base classifiers, and the test set results of each base classifier are fused according to the voting mechanism to determine the final output. The voting based integrated learning framework is shown as Figure 2.
The voting mechanisms used in this paper are divided into two categories: hard voting mechanisms and soft voting mechanisms. Taking the binary classification problem as an example, the prediction target is y∈{-1, 1}, There are three models A, B, and C to predict the sample (xi, yi). The hard voting mechanism is to vote on the model prediction label. Assume that the prediction results of A, B, and C models are 1, 1, and 1, so the final prediction result is 1. The soft voting mechanism refers to weighted voting on the probabilities of each category predicted by each base model. The highest category probability is the final result, and its calculation is list in Eq (2):
Here, Pk represents the probability that the fault predicted by the base classifier is category k, k = 1, 2, ..., m, m represents the total number of fault categories, wi represents the weight of each base classifier, i = 1, 2, ..., n, n represents the total number of base classifiers, and Ek represents the probability that the predicted fault category is k after weighted voting, and yi is the final result.
3.2. Combination of base classifiers based on diversity criteria
The integrated model for fault diagnosis of heat meters is the result of the combination of
multiple base classifiers. The standard for building an integrated model is that the recognition accuracy of fault types is high, and the diversified combination of base classifiers should be achieved, that is, multiple base classifiers should be used as much as possible.
Taking a binary classification problem as an example, this paper analyzes the difference of the integration effect when multiple base classifiers are combined. For the binary classification problem, the prediction target is y∈{−1,1}, the mapping relationship of the function is f, and the error of the classifier is ε, Then the error relationship of all classifiers is:
The T classifiers are processed according to the voting mechanism, and the error relationship is as follows:
Here, hi(x) represents the ith classifier, and i = 1, 2, ..., T, T represents the total number of classifiers.
It can be seen from the above formula that the error of integrated learning decreases with the increase of T. The contrast effect of different base classifiers after integration is shown in Table 2.
From the comparison results in Table 2, we can see that the prediction rate of base classifier 1, base classifier 2 and base classifier 3 is 66.7% respectively, and they have different performance in different test cases, so the effect of integrated model A reaches 100%; Although the prediction rate of base classifier 4, base classifier 5 and base classifier 6 is 66.7%, the performance of the three is the same and they do not meet the diversification conditions, which leads to the failure of integrated model B; The performance of base classifier 7, base classifier 8 and base classifier 9 is poor, and the integrated model C is discarded directly.
3.3. Weight setting of base model based on pigeon swarm algorithm
The accuracy of the diagnosis results of the fault diagnosis model is closely related to the weight of each base classifier in decision-making. In order to obtain the reasonable weight allocation of each base classifier, this paper uses pigeon swarm algorithm to optimize the weight.
Pigeon colony algorithm is different from genetic algorithm and ant colony algorithm in that it performs global optimization with relatively clear objectives. For the fault diagnosis model proposed in this paper, the possible fault causes of the heat meter and the classifier determined in Section 3.2 are clear, so the pigeon swarm algorithm is more appropriate.
Pigeon swarm optimization algorithm is inspired by the special navigation behavior in the homing process of pigeons.
Set the weight of each base classifier as a pigeon to form a pigeon group. During initialization, the position and speed of pigeons are set randomly. After the implementation of the pigeon swarm algorithm, each pigeon in the pigeon swarm obtains the fitness value through the optimization function. In the iterative process of the pigeon swarm algorithm, each pigeon searches its own individual optimal value Pbest, and also searches the global optimal value Gbest of the pigeon swarm through sharing. In the pigeon group, all pigeons constantly adjust their position and speed according to the individual optimal value and the global optimal value until all pigeons reach or approach the optimal result.
In a pigeon group consisting of n pigeons, the position of a single pigeon (the weight of the base classifier in this paper) wi can be expressed as a D dimension vector, namely:
The flight speed of each pigeon can be expressed as:
The current individual optimal value of each pigeon can be expressed as:
The current global optimal value of the whole pigeon group can be expressed as:
In the iterative process of pigeon swarm algorithm, the velocity update formula of each pigeon can be expressed as:
Here, R represents the map parameter, rand is a random value within the range of 0–1, and gd represents the dth dimensional component of the global optimal value.
In the iterative process of pigeon swarm algorithm, the position update formula of each pigeon can be expressed as:
3.4. Pseudocode of proposed algorithm
In order to realize the fault diagnosis of the heat meter data, the proposed algorithm involves data equalization, classifier selection, classifier weight determination and other aspects. Here, the pseudo code of the entire algorithm is given as follows:
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
The Algorithm for Fault Diagnosis
1: Data equalization based on interpolation by Eq (1);
2: Classifier selection based on voting mechanism by Eq (4);
3: Generate Random Pigeon Swarm by Eq (5);
4: Compute Fitness;
5: Compute Local Optimal Position by Eq (7):
6: Compute Global Optimal Position by Eq (8);
7: While Iterative_error > Threshold
8: Update Position Particle;
9: Update Local Optimal Position;
10: Update Global Optimal Position;
11: Output Weight of Each Classifier;
12: Calculate the judgment result of the fault type;
///////////////////////////////////////////////////////////////////////////////////////////////////////////////
4.
Experimental results and analysis
In order to verify the effectiveness of the heat meter fault diagnosis method proposed in this paper, the fault data set of heat meters in Heilongjiang Province is taken as the processing object, and 10,000 of them are selected as sample data. Among the 10,000 samples data, 66% of them are used as the training data set, and 34% of them are used as the test data set.
4.1. Selection experiment of base classifier
At present, in the field of fault diagnosis of intelligent instruments, many mature methods can be used as the base classifier, which needs to be selected through experiments. In this paper, a total of eight common fault diagnosis methods are selected as the candidates for the base classifier, namely: NB (Naive Bayes) algorithm, SVM (Support Vector Machines) algorithm, LR (Logistic Regression) algorithm, DT (Decision Tree) algorithm, BP (Backpro Pogation) algorithm, RF (Random Forest) algorithm, KNN (K-Nearest Neighbor) algorithm, GBDT (Gradient Boosting Decision Tree) algorithm. According to the classification criteria for seven types of common faults of heat meters given in this paper, the diagnostic accuracy of the above eight algorithms for each type of fault is shown in Table 3.
In order to intuitively observe the diagnosis results of 8 types of methods for 7 types of faults of the heat meter, the data in Table 3 are drawn in the form of curves, as shown in Figure 3.
From the results in Table 3 and Figure 3, it can be seen that the GBDT method has the best average diagnostic accuracy rate for seven types of faults of the heat meter, reaching 68.46%. Next is KNN method. The average diagnostic accuracy of seven types of faults of the heat meter reaches 67.95. Therefore, these two methods are selected as the first choice and the second choice of the three base classifiers. Then DT method, BP method and RF method are used as the three choices of the base classifier respectively, thus forming three integrated models for the failure diagnosis of heat meters, namely:
Integrated model A for fault diagnosis of heat meter: GBDT + KNN + DT;
Integrated model B of heat meter fault diagnosis: GBDT + KNN + BP;
Thermal energy meter fault diagnosis integrated model C: GBDT + KNN + RF.
The above three fault diagnosis models will be used in subsequent fault diagnosis experiments for further fault diagnosis effect evaluation. It should be pointed out here that if KNN algorithm or GBDT algorithm is selected separately, because the classifier is single, for some types of heat meter faults, the detection results will be obviously misjudged. The integration of the three classifiers according to the method in this paper will greatly reduce the possibility of miscalculation.
4.2. Evaluation index of fault diagnosis performance
The effect of fault diagnosis is closely related to the fault classification. Generally, confusion matrix method is used to judge whether the classification is accurate. The composition of confusion matrix is listed in Table 4.
In Table 4, TP represents the number of samples that are both actually positive and predicted; FP represents the number of samples that are actually negative but are predicted to be positive; FN represents the number of samples that are actually positive but are predicted to be negative; TN indicates the number of samples that are actually negative and the prediction is also negative.
According to the confusion matrix, we can further build three evaluation indicators, namely Precision, Recall and F1 Core.
Precision reflects the model's ability to distinguish negative samples. The higher the value, the stronger the model's ability to distinguish negative samples, which can be expressed as:
Recall reflects the model's ability to distinguish positive samples. The higher the value, the stronger the model's ability to recognize positive samples. It can be expressed as:
F1 Score is the comprehensive result of Precision and Recall. The higher the value, the more robust the model is. It can be expressed as:
Here, m represents the total number of fault types; i indicates the fault type.
4.3. Experimental results of fault diagnosis for integrated model based on voting mechanism
In Section 4.1, three heat meter based fault diagnosis models are obtained, which integrate three base classifier algorithms respectively. The weight allocation results of the base classifier under each integrated model is shown in Table 5.
The above three types of integrated models are used to further diagnose the seven common faults of the heat meter, and the accuracy is shown in Table 6.
In order to intuitively observe the diagnosis results of the three types of integrated models on the seven types of faults of the heat meter, the data in Table 6 are drawn in the form of curves, as shown in Figure 4.
It can be seen from the results in Table 6 and Figure 4 that the average diagnostic accuracy of integrated model A for seven types of faults of the heat meter is the best, reaching 84.52%. The second is the integrated model B. The average diagnostic accuracy rate of seven types of faults of the heat meter reaches 82.03. Although the diagnostic accuracy of the integrated model C for the seven types of faults of the heat meter is lower than that of the other two types of models, it is also significantly higher than that of the eight single methods in Table 3. This group of experimental results fully proves that the integrated model constructed by the diversified basis classifier is effective for the fault diagnosis of heat meters.
According to the method given in Section 4.2, further compare the Precision, Recall and F1 Core indicators of the three integration models. The results are shown in Table 7.
It can be seen from the results in Table 7 that integration model A shows better results in the three indicators of Precision, Recall and F1 Core. Although the results of integration models B and C are slightly worse than that of integration model A, they also achieve relatively good results.
4.4. Comparison with other methods
Based on the integrated model A, the method in this paper is applied to the fault diagnosis of heat meters, and probabilistic bayesian network method (PBNM) and recursive neural network method (RNNM) are selected as reference methods to form the fault diagnosis results of the three methods, are shown in Table 8.
It can be seen from the comparison results in Table 8 that the accuracy of this method for the diagnosis of seven types of common faults of heat meters is significantly higher than that of probabilistic bayesian network method (PBNM) and recursive neural network method (RNNM).
5.
Conclusions
This paper focuses on the fault diagnosis of heat meters. First of all, the common faults of the heat meter are classified, including those caused by electrical hardware, mechanical hardware, communication, storage, billing software, metering software, and other reasons. On this basis, the method based on interpolation is adopted to eliminate the imbalance of various fault data. An integrated model of multi classifier fusion is established based on voting mechanism. The weight of each classifier is optimally configured through pigeon swarm algorithm to realize the fault diagnosis of the heat meter. During the experiment, according to the diagnostic effects of eight classifier algorithms, three integration models are constructed, namely: Model A, GBDT + KNN + DT; Model B, GBDT + KNN + BP; Model C, GBDT + KNN + RF. The experimental results show that the integrated model proposed in this paper has greatly improved the accuracy of fault diagnosis of the heat meter, and has also received higher evaluation in the three indicators of Precision, Recall and F1 Core.
Conflict of interest
The authors declare there is no conflict of interest.