Rhythm distribution | N | AFp | AFf | Subject | Sum |
Dataset Ⅰ | 481 | 96 | 153 | 0–53 | 730 |
Dataset Ⅱ | 251 | 133 | 322 | 54–104 | 706 |
Sum | 732 | 229 | 475 | – | 1436 |
We propose a new mathematical framework for the addition of stochastic attachment to biofilm models, via the use of random ordinary differential equations. We focus our approach on a spatially explicit model of cellulolytic biofilm growth and formation that comprises a PDE-ODE coupled system to describe the biomass and carbon respectively. The model equations are discretized in space using a standard finite volume method. We introduce discrete attachment events into the discretized model via an impulse function with a standard stochastic process as input. We solve our model with an implicit ODE solver. We provide basic simulations to investigate the qualitative features of our model. We then perform a grid refinement study to investigate the spatial convergence of our model. We investigate model behaviour while varying key attachment parameters. Lastly, we use our attachment model to provide evidence for a stable travelling wave solution to the original PDE-ODE coupled system.
Citation: Jack M. Hughes, Hermann J. Eberl, Stefanie Sonner. A mathematical model of discrete attachment to a cellulolytic biofilm using random DEs[J]. Mathematical Biosciences and Engineering, 2022, 19(7): 6582-6619. doi: 10.3934/mbe.2022310
[1] | Ruirui Han, Zhichang Zhang, Hao Wei, Deyue Yin . Chinese medical event detection based on event frequency distribution ratio and document consistency. Mathematical Biosciences and Engineering, 2023, 20(6): 11063-11080. doi: 10.3934/mbe.2023489 |
[2] | Yan Wang, Yonghui Qiao, Yankai Mao, Chenyang Jiang, Jianren Fan, Kun Luo . Numerical prediction of thrombosis risk in left atrium under atrial fibrillation. Mathematical Biosciences and Engineering, 2020, 17(3): 2348-2360. doi: 10.3934/mbe.2020125 |
[3] | Jun Yan, Tengsheng Jiang, Junkai Liu, Yaoyao Lu, Shixuan Guan, Haiou Li, Hongjie Wu, Yijie Ding . DNA-binding protein prediction based on deep transfer learning. Mathematical Biosciences and Engineering, 2022, 19(8): 7719-7736. doi: 10.3934/mbe.2022362 |
[4] | Xiao Wang, Jianbiao Zhang, Ai Zhang, Jinchang Ren . TKRD: Trusted kernel rootkit detection for cybersecurity of VMs based on machine learning and memory forensic analysis. Mathematical Biosciences and Engineering, 2019, 16(4): 2650-2667. doi: 10.3934/mbe.2019132 |
[5] | Giuseppe Ciaburro . Machine fault detection methods based on machine learning algorithms: A review. Mathematical Biosciences and Engineering, 2022, 19(11): 11453-11490. doi: 10.3934/mbe.2022534 |
[6] | Hao Zhang, Lina Ge, Guifen Zhang, Jingwei Fan, Denghui Li, Chenyang Xu . A two-stage intrusion detection method based on light gradient boosting machine and autoencoder. Mathematical Biosciences and Engineering, 2023, 20(4): 6966-6992. doi: 10.3934/mbe.2023301 |
[7] | Miao Zhu, Tao Yan, Shijie Zhu, Fan Weng, Kai Zhu, Chunsheng Wang, Changfa Guo . Identification and verification of FN1, P4HA1 and CREBBP as potential biomarkers in human atrial fibrillation. Mathematical Biosciences and Engineering, 2023, 20(4): 6947-6965. doi: 10.3934/mbe.2023300 |
[8] | Longmei Zhang, Wei Lu, Feng Xue, Yanshuo Chang . A trajectory outlier detection method based on variational auto-encoder. Mathematical Biosciences and Engineering, 2023, 20(8): 15075-15093. doi: 10.3934/mbe.2023675 |
[9] | Naigong Yu, Hongzheng Li, Qiao Xu . A full-flow inspection method based on machine vision to detect wafer surface defects. Mathematical Biosciences and Engineering, 2023, 20(7): 11821-11846. doi: 10.3934/mbe.2023526 |
[10] | Muhammad Hassan Jamal, Muazzam A Khan, Safi Ullah, Mohammed S. Alshehri, Sultan Almakdi, Umer Rashid, Abdulwahab Alazeb, Jawad Ahmad . Multi-step attack detection in industrial networks using a hybrid deep learning architecture. Mathematical Biosciences and Engineering, 2023, 20(8): 13824-13848. doi: 10.3934/mbe.2023615 |
We propose a new mathematical framework for the addition of stochastic attachment to biofilm models, via the use of random ordinary differential equations. We focus our approach on a spatially explicit model of cellulolytic biofilm growth and formation that comprises a PDE-ODE coupled system to describe the biomass and carbon respectively. The model equations are discretized in space using a standard finite volume method. We introduce discrete attachment events into the discretized model via an impulse function with a standard stochastic process as input. We solve our model with an implicit ODE solver. We provide basic simulations to investigate the qualitative features of our model. We then perform a grid refinement study to investigate the spatial convergence of our model. We investigate model behaviour while varying key attachment parameters. Lastly, we use our attachment model to provide evidence for a stable travelling wave solution to the original PDE-ODE coupled system.
Atrial fibrillation (AF) is a common cardiovascular disease characterized by tachyarrhythmia, which can lead to increased rates of stroke, heart failure, and other cardiovascular diseases [1]. Therefore, early detection of AF episodes is crucial for patients to access to treatment as early as possible.
According to the duration of AF episodes, it can be divided into three grades: (1) paroxysmal AF (AFp), (2) persistent AF (AFf), and (3) permanent AF. AFp has the characteristics of short duration of attack, high recurrence rate, difficult to capture on electrocardiogram (ECG) [2]. With the prolongation of the patient's course of disease, it will gradually evolve into AFf or permanent AF, threatening the patient's life [3]. Therefore, the detection of AFp events is very significant for patients to receive effective treatment in time. ECG is a convenient, rapid and non-invasive detection method, which is widely used in clinical practice [4,5]. The AF episode is generally associated with irregular RR intervals when atrioventricular conduction is not impaired [6]. In addition, the absence of distinct repeating P waves and the presence of f waves help to identify the AF rhythm [6]. However, for AFp detection, conventional 12-lead ECG has great limitation, while dynamic ECG has high detectable rate and accuracy (Acc) [7].
At present, many automatic detection algorithms have been proposed for AFp detection based on dynamic ECG. Liu et al. [8] proposes a detection method for AFp based on covariance descriptor and nuclear sparse coding. Petrenas et al. [9] designs a detector consisting of RR interval irregularity characterization, P-wave absence, f-wave presence, and noise level, which are used for fuzzy logic classification to detect brief AF episodes of AFp. Ganapathy et al. [10] proposes an automatic detection algorithm for AFp based on co-occurrence matrices computed from RR intervals. Xin et al. [11] applies a multi-scale wavelet analysis to extract features of heart rate variability (HRV), and then completes AFp detection based on a SVM classifier.
In addition to accurate detection of AFp, precise location of starting and ending points for AF episodes is also crucial for an effective AFp management, which provides more diagnosis and treatment information for doctors to implement appropriate medical intervention [12]. As such, AF burden associated with risk of stroke can be defined and measured better during a long-term AF monitoring period instead of device-detected AF duration as measured from cardiac implantable electronic devices (CIEDs) [13]. Previous algorithms aim at the detection of AFp, but fail to locate the starting and ending points for AF episodes of AFp. Recently, the 4th China Physiological Signal Challenge 2021 (CPSC 2021) [14] has provided two large public databases and an undisclosed test database for participants to implement algorithms for AFp event detection. We take this opportunity to propose an efficient two-step method that meet challenge rules for the starting and ending point location of AF episodes in AFp rhythm. In the first step, we divide all records with unequal length into (1) non-atrial fibrillation (non-AF, N) rhythm, (2) AFp rhythm and (3) AFf rhythm based on machine learning. In the second step, we locate the starting and ending points for AF episodes of AFp rhythm identified in the first step based on a convolutional neural network (CNN).
Tow-step detection allows us to select the most appropriate methods to achieve the goals in each step. As such, traditional feature extraction combines deep learning to maximize the usage of beneficial information in different steps. In the first step, we directly extract artificial features from unequal-length ECG records based on RR intervals reflecting ventricular activations, advisably evading the requirement for fixed-length records during feature extraction by sliding window. In the second step, a CNN is applied to judge whether the single target beat constitutes AF rhythms or not, which is essentially a heartbeat binary classification problem. Besides, due to large amount of heartbeat segments are available from the public CPSC2021 databases, we train the CNN in multiple phases and choose different learning rate according to training situation as well as the size of the training set, thereby avoiding nonconvergence of model loss when model trained on a large dataset.
To summarize, the main contributions of this paper are listed as follows:
● We propose a two-step method to detect AF episodes, in which the SVM classifier is for classification and the 1-dimensional (1-d) CNN is for location.
● It is a method based on both ventricular responses and atrial activities that is suitable for analyzing unequal-length ECG records.
● We propose the phased training fashion that adjusts learning rate and the size of the training set to train the deep model based on large data.
CPSC 2021 [14] provides two public databases named as dataset Ⅰ and dataset Ⅱ, and an unpublished test set where partial data is from the same source as two public databases. Both public databases contain a large number of variable-length ECG record fragments, which are extracted from lead Ⅰ and lead Ⅱ of long-term dynamic ECG signals. The dataset Ⅰ includes 730 records, which is collected from 54 subjects. The dataset Ⅱ is collected from 51 subjects, including 706 records. There are three rhythm types, namely, non-AF rhythm, AFp rhythm, and AFf rhythm. Table 1 shows the amount distribution of rhythms in public databases. The duration of records ranges from 0.14 to 411.11 minutes, and average duration is 20.33 minutes. In these records, the original sampling rate is 200 Hz, and both non-AF episodes and AF episodes are not shorter than continuous 5 beats.
Rhythm distribution | N | AFp | AFf | Subject | Sum |
Dataset Ⅰ | 481 | 96 | 153 | 0–53 | 730 |
Dataset Ⅱ | 251 | 133 | 322 | 54–104 | 706 |
Sum | 732 | 229 | 475 | – | 1436 |
We use records from lead Ⅱ in our experiment. For rhythm classification in the first step, we divide records from dataset Ⅰ and dataset Ⅱ into a rhythm training set and a rhythm test set. The amount distribution of rhythms in subsets is shown in Table 2, notably, the distribution in number of the rhythm training set is relatively balanced. In addition, rhythms belonging to the same class come from different subjects in the training rhythm set and the test rhythm set.
Rhythm distribution | N | AFp | AFf | Subject | Sum |
Training rhythms from Dataset Ⅰ | 0 | 77 | 0 | 3, 25, 31, 32, 39, 40, 48, | 77 |
Training rhythms from Dataset Ⅱ | 251 | 133 | 322 | 54–96, 99,100,102–104 | 706 |
Test rhythms from Dataset Ⅰ | 481 | 0 | 153 | 0–53 | 634 |
Test rhythms from Dataset Ⅱ | 0 | 19 | 0 | 97, 98,101 | 19 |
Sum | 732 | 229 | 475 | – | 1,436 |
The rhythm training set | 251 | 210 | 322 | – | 783 |
The rhythm test set | 481 | 19 | 153 | – | 653 |
For heartbeat classification in the second step, we refer to heartbeats constituting AF episodes as AF beats, and heartbeats constituting non-AF episodes as N beats. Finally, a beat training set, a beat validation set, and a beat test set are generated from the public CPSC 2021 databases. For heartbeat distribution in Table 3, the amount ratio of N beats and AF beats is about 1:1 in each subset. A portion of the beat training set and the whole beat training set are used under phased training fashion, which will be introduced in Section 4. In order to obtain a model with outstanding performance and convictive test results, heartbeat segments in each subset come from different records.
Heartbeat distribution | Non-AF beats | AF beats | Sum |
A portion of the beat training data | 372,616 | 368,836 | 741,452 |
The whole beat training set | 629,425 | 627,188 | 1,256,613 |
The beat validation test set | 32,257 | 31,134 | 63,391 |
The beat test set | 104,381 | 15,615 | 119,996 |
Sum | 1,138,679 | 1,042,773 | 2,181,452 |
For AFp event detection, an effective method is first to determine whether the record is AFp rhythm, and then to locate the starting and ending points for AF episodes. Therefore, we divide the process of AFp event detection into two steps, namely, rhythm classification and heartbeat classification. In each step, we choose the most appropriate algorithm and optimize accordingly. For rhythm classification in the first step, all records are classified into three rhythm types, including non-AF rhythm, AFf rhythm, and AFp rhythm. A traditional machine learning method is used to distinguish the rhythm type of each record. For heartbeat classification in the second step, a deep learning-based method classifies beats of AFp rhythms predicted in the first step into non-AF beats or AF beats. And then, a median filter postprocesses the results of heartbeat classification, which can effectively reduce false detection of AF beats. Finally, the starting and ending points of AF episodes are the location of changes between beat types. Figure 1 shows the flowchart of AFp event detection.
The rhythm classification is based on features calculated by RR intervals and classifier selection. Each record is taken as a single rhythm, either non-AF rhythm or AFp rhythm, or AFf rhythm. The process consists of three parts: (1) feature extraction, (2) feature normalization and (3) classification, as shown in Figure 2(a). For feature extraction, we extract representative features of each record. For feature normalization, each feature is processed by min-max normalization. For classification, a support vector machine (SVM) classifier is trained by grid search and 5-fold cross validation on the rhythm training set, and is tested on the rhythm test set. In order to show existing and representative methods for rhythm classification visually, we introduce these methods in detail besides our method and our optimized method in Table A1 (in Appendix).
AF rhythm has obvious irregular RR intervals and analysis based on ventricular responses is more robust for the noise [15]. Thus, the rhythm type is judged according to characteristics of RR intervals. We extract features in time domain, frequency domain, and nonlinear domain, respectively. The features are extracted from variable-length records, but the number of features is fixed for each record, which meeting the requirement of the classifier with fixed input length. Hence, we do not need to crop records, nor do we need to be based on a fixed number of RR interval sequences.
● Time domain features: The features include average RR intervals (avgRR), percentage of differences greater than 50ms between RR intervals (pRR50), number of differences greater than 50ms between RR intervals (RR50), standard deviation of RR intervals (SDRR) and root mean square of successive differences of RR intervals (RMSSD), etc.
● Frequency domain features: The RR intervals are first resampled to 7 Hz. A frequency power spectrum is estimated using welch power spectral density estimate (hanning window; 4096 points; 50% overlap). The calculated frequency domain features include normalized low-frequency power (LFnorm), normalized high-frequency power (HFnorm), as well as ratio of low-frequency power and high-frequency power (rLH), etc.
● Non-linear features: The features are obtained based on the analysis of Lorenz plot. The features include standard deviation for the transverse direction (sd1), standard deviation for the longitudinal direction (sd2), cardiac sympathetic index (csi), cardiac vagal index (cvi), etc.
The convergence of a classifier will slow down when the classifier is trained under different dimension of features, even to have a negative influence on the classifier's performance [16]. To solve this problem, we adopt min-max normalization for the rhythm subsets. It is worth noting that the min and max value of the rhythm training set are also used to normalize the rhythm test data.
The SVM classifier [17] that proposed by Boser has a strong generalization ability, which can convert low-dimensional data that are linearly indivisible to high-dimensional codes by kernel mapping, and mapping form depends on the form of kernel function. The process can be summarized as a convex optimization problem with optimal solution [18,19]. In our experiment, we choose a SVM classifier with a radial basis function (RBF) kernel to automatically identify records' rhythm classes. Optimal parameters are determined by 5-fold cross validation and grid search on the rhythm training data.
Heartbeat classification is the basis of detection for various arrhythmias [18,20]. A single beat is short enough for 1-d CNN to automatically extract deep features based on its morphological characteristics. Considering that the aim is to locate the starting and ending points of AF episodes by classifying beats constituting AFp rhythms, we label beats either AF beats or non-AF beats, rather than labeling beats in term of the Association for the Advancement of Medical Instrumentation (AAMI) standard [21]. Therefore, when model tested on the beat test set, the variety of predicted heartbeat types between AF beats and non-AF beats can indicate the starting and ending points of AF episodes. The process of heartbeat classification is composed of five parts, respectively, removing baseline, heartbeat segmentation, heartbeat normalization, deep learning-based model building, as shown in Figure 2(b). Then, a median filter corrects misjudged heartbeats to a certain extent as a post-processor. The phase training fashion is also introduced in this section.
There are other outstanding algorithms for AF episode location by new ideas that have innovation and value whether in preprocessing or in model structure or in overall location process, which are provided by other participants in CPSC 2021 and have not been published.
Considering that ECG signals in clinical contain various of noise, we only filter baseline drift and adopt no other denoising approaches in order to enhance the robustness to the noise of the proposed method. As the method adopted by Chazal [18] and Jiang [20], a 200 ms median filter first removes P waves and QRS complexes of an original signal, and then a 600ms median removes T waves, so that we can get the baseline drift signal. Finally, the ECG signal without baseline drift is obtained by subtracting the baseline drift signal from the original ECG record.
The original sampling rate of ECG signals from CPSC2021 databases is used as the standard sampling rate, which is 200 Hz. The R-labels provided by CPSC 2021 databases is used as the fiducial R-points. Each segment is a single heartbeat with 200 sampling, which consists of 60 points in front of the R-point and 139 points after the R-point. The proportion of sampling points before and after the R-point is close to 3:7. Each heartbeat segment contains a complete P wave, a QRS complex, and a T wave if various waves exist.
The amplitude gains of ECG signals that are collected by different ECG monitoring devices have an obvious difference. Besides, the normalization of input data can contribute convergence and improve generalization of a model [22]. Thus, it is necessary to normalize data to a uniform scale. By Eq (1), the amplitude of heartbeat segments is mapped into the range of 0–1.
Fo = F−FminFmax−Fmin+KK=e−06 | (1) |
where F is a feature matrix; Fmin and Fmax are matrices, which are combined by minimum and maximum vectors from features, respectively; K is set to e-06 to preserve the divisor from being zero.
A CNN can automatically extract deep features from ECG signals, and is an end-to-end fashion that combines feature extraction and classification [23]. We build a 1-d model in Figure 3, whose deepening fashion borrows that of VGG13 network [24]. The whole model consists of an automatic deep feature extractor and a classifier. The feature extractor is composed of convolutional layers, maximum pooling layers, and dropout layers. The preprocessed heartbeat segments are as the input of the feature extractor directly. Every two stacked convolutional layers have the same number filters, and are followed by a maximum pooling layer and a dropout layer to form a convolutional unit. Finally, five convolutional units are stacked in series to constitute the feature extractor. As the network deepens, the number of convolutional layers' filters in the five units increases or stays unchanged, and is 64,128,256,512 and 512 in order. The kernel size and the stride of convolutional layers are all 3 and 1, respectively. Maximum pooling layers with step size 2 and kernel size 2 halve the size of feature maps. For dropout layers, the probability of discarding neurons is set to 0.3 in order to reduce the risk of model overfitting. The rectified linear unit (RELU) activation function in convolutional layers is used for nonlinear mapping.
The classifier is composed of a flattening layer, three fully connected layer, and three dropout layers. Firstly, a flattening layer expands the feature matrix into a feature vector. Then, fully connected layers and dropout layers are stacked alternately. The probability of discarding neurons in the classifier's dropout layers is set to 0.5. The number of neurons of fully connected layers is 256,128 and 1, respectively. The last fully connected layer is as the output layer, which only has one neuron and uses the sigmoid activation function for binary classification. The output of the classifier is the probability that the target beat is an AF beat. The feature extractor and classifier are connected for end-to-end training. Table 4 shows the hyperparameters of our model.
Layer | Output | Kernel | Stride | Padding | Total parameter | FLOPs |
Input | (200, 1) | – | – | – | 3,956,289 | 0.148 G |
Convolutional unit 1 | (100, 64) | 3 | 1 | same | ||
Convolutional unit 2 | (50,128) | 3 | 1 | same | ||
Convolutional unit 3 | (25,256) | 3 | 1 | same | ||
Convolutional unit 4 | (12,512) | 3 | 1 | same | ||
Convolutional unit 5 | (6,512) | 3 | 1 | same | ||
Flatten | (3072, ) | – | – | – | ||
FL 1 | (256, ) | – | – | – | ||
Dropout 6 | (256, ) | 0.5 | – | – | ||
FL 2 | (128, ) | – | – | – | ||
Dropout 7 | (128, ) | 0.5 | – | – | ||
Sigmoid | (1, ) | – | – | – |
The beat training set is so large that model training needs take a lot of time. In order to monitor training process and avoid failing to train, we propose a phased training fashion, and take every 300 epochs as a training phase. In addition, according to the situation of model convergence, we change the amount of training data or the learning rate in each phase. Figure 4 shows the operation. The model is trained under four phases, a total of 1200 epochs. We save the beat model with the highest F1 score on the beat validation training set in the fourth phase.
A large training data benefits the model performance, but may make training loss not easy to converge, and increase the possibility of training failure [25]. Therefore, we first train a model on a portion of the beat training data until convergence, and then continue train it on the whole beat training set where abundant new data are added. The positive effect on the model performance can be observed clearly when the beat training set is extended. To the authors' knowledge, the trainable parameters are randomly initialized in the first training phase, and then, in the phase where training data are added, the optimized parameters are transferred into initial parameters in the current phase. The optimized initial values of trainable parameters can reduce the possibility of the model falling into a local optimal solution [26].
In our experience, a portion of the beat training data first make the model has the ability to distinguish different classes. As shown in Figure 5(a), after more than 1000 epochs, the loss of the beat validation set presents an upward trend, and the Acc of the beat validation set is in a significant downward trend. Next, the optimized parameters as the initial parameters are trained on the whole training set. Therefore, we expand the beat training set at the beginning of the fourth phase. That is, we fine-tuned the model obtained at the end of the third phase on the whole beat training set. Figure 5(b) shows the loss and Acc curves acquired after data expansion. A large number of new samples make the model learn new discriminative features quickly, and improve the model performance rapidly. At more than 1000 epochs, the validation loss is basically stable, and the validation Acc maintains above 90%. Finally, we obtain the best model at the 1043rd epoch, which has the highest F1 score on the beat validation set during the fourth phase.
The model is difficult to converge under a large learning rate, while a small learning rate tends to make the model fall into a local minimum [27]. Thus, it is necessary to adjust the learning rate based on the situation of model training. We use Adam optimizer to realize back propagation of training loss on the beat training set. Each batch contains 256 heartbeat segments. The model runs on GPU 3090 and Ubuntu system. We determine the learning rate at the beginning of each phase according the training loss and validation loss, so that the model can be well trained on the large beat training set.
In the first phase, the learning rate of Adam optimizer is 0.0001, and then we change into one second of the original at the beginning of the second phase and keep the learning rate constant until the beginning of the fourth phase. By observing the loss curve of the beat validation set, there is room for convergence. Therefore, we change the value of the learning rate into 0.0001 back at the beginning of the fourth phase to speed up the model's convergence.
According to CPSC 2021, the number of continuous beats that constitute AF episodes or non-AF episodes is no less than 5. Therefore, we use a median filter with a window accommodating five beats to correct the predicted types by the model for heartbeat classification. Under postprocessing, the predicted result of the target beat is replaced by the median of the target and its four surrounding beats. The process corrects the outliers of the predicted results, and reduce the misdiagnosis of AF episodes. If the situation that the first beat or the last beat of an AFp rhythm is an AF beat is not considered, the starting points of AF episodes are the locations of R-points of AF beats before which the beat is a non-AF beat, and ending points are the locations of R-points of AF beats after which the beat is a non-AF beat.
In the 1-d CNN training process, all rhythms are firstly segmented so that we can collect AF beats and non-AF beats, and then the 1-d CNN is trained to predict the types of heartbeats. During the test process for a single record, the rhythm type is predicted in the first step. The outputs of starting points and ending points of AF were [] (the [] is the output, representing the AF episode is none) in non-AF rhythm and [1, value equal to the length of the record] (1 represents the index of the first heartbeat) in AFf rhythm. In the second step, heartbeats are classified beat by beat for the record predicted as AFp rhythm in the first step, and the indexes of beats are also retained. After postprocessing by median filter, the location of AF episodes is the indexes of beats whose types change from AF beat to non-AF beat and vice versa.
Four indicators, including Acc, sensibility, (Sen), positive predictive value (PPV) and F1 score, are used to evaluate the performance of our algorithms that are applied for rhythm classification and heartbeat classification. These indicators can be expressed by false positive (FP), true positive (TP), false negative (FN) and true negative (TN). The FP refers to the number of positive samples predicted to be negative, and the TP refers to the number of positive samples that are predicted correctly. The FN means the number of negative samples predicted to be positive, and the TN is the number of negative samples predicted to be correct. The four indicators are defined in Eqs (2)–(5), respectively.
ACC = TN+TPTN+TP+FN+FP | (2) |
Sen = TPTP+FN | (3) |
PPV = TPTP+FP | (4) |
F1 = (1+β)2×Sen×PPVβ2×Sen+PPV | (5) |
The overall Acc is the percentage of the samples that are accurately predicted. The Sen reflects the rate of missed diagnosis of a certain class, and the PPV presents the rate of misdiagnosis of a certain class. The F1 score provides a systematic evaluation of the algorithms' performance on an unbalanced test set. The β in Eq (5) is usually set to 1, therefore, the F1 score is the harmonic mean of Sen and PPV.
The indicators for AFp event detection are defined by CPSC 2021. Ur is the score of rhythm detection, and Ue is the score of the location of starting and ending points for AF episodes of AFp rhythms. The scoring criteria of Ur and Ue is shown in Figure 6. The score of each record is the sum of Ur and weighted Ue, and the final score U is the mean score of all records in a test set, which is descripted in Eq (6).
U = 1NN∑i = 1(Uri+Maimax{Mri,Mai}×Uei) | (6) |
where N is the number of records in a test set; Ma is the number of true AF episodes in each record; Mr is the number of predicted AF episodes in each record.
The realization of rhythm classification algorithm and heartbeat classification algorithm is the premise of starting and ending point detection for AF episodes. In Figure 7, two confusion matrices are the results of rhythm classification and heartbeat classification on their respective test sets. Table 5 shows calculated assessment metrics. Both algorithms have good classification performance, and achieve the overall Acc of 98.62% and 90.81%, respectively. However, there is room for optimization to distinguish AFf rhythms from AFp rhythms.
Algorithm | Class | Metric | Overall Acc |
|||
Sen | PPV | F1 | Acc | |||
Rhythm classification | N | 0.998 | 0.998 | 0.997 | 0.998 | 0.986 |
AFp | 0.684 | 0.867 | 0.988 | 0.765 | ||
AFf | 0.987 | 0.962 | 0.988 | 0.974 | ||
Heartbeat classification | Non- AF beat | 0.906 | 0.913 | 0.908 | 0.909 | 0.908 |
AF beat | 0.910 | 0.903 | 0.908 | 0.907 |
We are honored to participate in CPSC 2021 and test our proposed two-step method on the unpublished test set. In the first step, the starting and ending points are none for non-AF rhythms, and for AFf rhythms, the starting and ending points are respectively the position of R points of the first beat and the last beat. If the target record is predicted as an AFp rhythm, the deep learning-based model classifies beats of the AFp rhythm in the second step, and the starting and ending points are located after postprocessing. Finally, our proposed two-step method achieves a final score U of 1.9310 on the unpublished test set and ranks the eighth among all competing algorithms in CPSC 2021.
Comparing just one-stem algorithm for heartbeat classification to realize AF episode location, we discuss the advantages of the two-step method. For the two-step method, records are classified into three rhythm types based on ventricular responses in the first step. The SVM classifier can achieve high accuracy in classifying the records with variable lengths. For the records predicted as non-AF rhythms, the location of AF starting and ending points in non-AF rhythms is None. For those predicted as AFf rhythms, the starting point of AF is 1, and the end point is the value equal to the length of the corresponding record. In the second step, the records that are predicted as AFf and non-AF rhythms in the first step do not participate in the second step. The 1-d CNN only analyzes AFp rhythms obtained in the first step beat by beat.
If we just propose the algorithm for heartbeat classification to realize AF episode location. The 1-d CNN analyzes all non-AF, AFf and AFp rhythms. When AF beats and non-AF beats are classified beat by beat, false positives and false negatives may occur in non-AF and AFf rhythms respectively. Therefore, the accuracy of location will be lower than that of the two-step method.
The classifier for rhythm classification has room for optimization. Some redundant features negatively affect the performance of rhythm classification. Feature selection is a valid approach to abandon useless features. Thus, we apply genetic algorithm to select features to optimize the rhythm classifier. A SVM classifier is used as a prediction classifier to select the optimal features in the way of 5-fold cross validation. Finally, we get eight features, which are introduced in Table 6. The selected features represent records effectively, and the optimized SVM classifier has a good performance for all kinds of rhythms. Figure 8 shows the confusion matrix of the optimized SVM classifier on the rhythm test set.
Feature group | Num | Feature description | Symbol |
Time domain | 1 | Percentage of differences greater than 20ms between RR intervals. | pRR20 |
2 | Median absolute values of successive differences between RR intervals. | mavsd | |
3 | Ratio of the standard deviation divided by the mean of RR intervals. | rsdm | |
4 | Maximum heart rate. | maxHR | |
5 | Minimum heart rate. | minHR | |
Frequency domain | 6 | Low-frequency power. | LF |
7 | Normalized high-frequency power. | HFnorm | |
Non-linear domain | 8 | Modified cardiac sympathetic Index. | mdcsi |
We also compare the performance of other common classifiers before and after feature selection, including logistic regression (LR) classifier, naive Bayes (Bayes) classifier, K nearest neighbour (KNN) classifier, decision tree (DT) classifier and random forest (RF) classifier. In Table 7, decision tree achieves 100% Acc both before and after feature selection, and is the best rhythm classifier. To facilitate comparison, we test the DT after feature selection and deep model as well as the original SVM classifier and deep model on all records in the two public databases, and the score U is improved to 1.6632 from 1.6539.
Classifier | Before feature selection | After feature selection | ||||
Sen | PPV | Acc | Sen | PPV | Acc | |
LR | 0.985 | 0.845 | 0.974 | 0.989 | 0.871 | 0.982 |
Bayes | 1 | 1 | 1 | 1 | 1 | 1 |
KNN | 0.961 | 0.784 | 0.946 | 0.988 | 0.869 | 0.980 |
SVM | 0.890 | 0.942 | 0.986 | 0.990 | 0.920 | 0.991 |
DT | 1 | 1 | 1 | 1 | 1 | 1 |
RF | 0.999 | 0.983 | 0.998 | 1 | 1 | 1 |
This paper proposes a two-step method for the starting and ending point location of AF episodes as well as the detection of variable-length AFp rhythms. It is proposed for the first time that AFp recognition and AF episode location have been jointly considered for AFp event detection based on machine learning. In the first step, feature extraction and a classifier are combined to identify the rhythm type of a single record, which is either non-AF rhythm, or AFp rhythm, or AFf rhythm. In the second step, we use a CNN and take a single heartbeat as input for automatic deep feature extraction and heartbeat classification. In order to adapt to training on a large beat training set, a phased training method is proposed to change the size of the training set and the learning rate adaptively. In the end, our two-step method obtains a final score U of 1.9310 on the unpublished test set and ranks eight in CPSC 2021.
At present, we have optimized the algorithm for rhythm classification. (1) The genetic algorithm is applied for feature selection, and eight useful features are retained. (2) After multiple classifiers are compared, the SVM classifier are replaced by a decision tree classifier. In order to make the optimized method comparable with the original method, we test on all the records in the two public CPSC 2021 databases. Finally, the score U of the optimized method is improved by 0.0093. Our optimized two-step method is suitable for the AFp analysis of long-term recordings and helpful for AF burden index measurement.
This work was supported in part by Shanghai Municipal Special Project of Industry Transformation and Upgrading under Grant GYQJ-2020-1-31, Medical Scientific Research Key Project of Jiangsu Commission of Health under Grant ZDB2020025 and Medical Scientific Research Instructional Project of Jiangsu Commission of Health under Grant Z2020075.
The authors declare that there is no conflict of interest.
Existing method | Liu [8] | Petrenas [9] | Ganapathy [10] | Xin [11] | Our method | Our optimized method |
Feature | RR intervals | RR intervals, f wave presence, P wave absence, and noise level | RR intervals | HRV signal | RR intervals | |
Lead | Single lead | Two leads | Single lead | |||
FT | Covariance representation and kernel mapping | Fuzzy logic | Dynamic symbol assignment and covariance representation | Wavelet analysis and scale entropy calculation | Time domain, frequency domain, and non-linear domain representation | Time domain, frequency domain, non-linear domain representation, and feature selection |
Classifier | Kernel sparse coding and dictionary learning | Sliding window and threshed | KNN, SVM, RF, ROF, EL | SVM | SVM | DT |
length | 33 RR intervals | Finite length (about 2 min) | Finite length (1 min~5 min) | Fixed length (5 min) | Variable length (≥ 4 RR intervals) | Variable length (≥ 4 RR intervals) |
Class | AFp rhythm | Occult AFp rhythm | AFp and non-AF rhythm | Non-AF, AFp and other rhythm | AFp, AFf and non-AF rhythm | AFp, AFf and non-AF rhythm |
Note: FT represents feature transformation; HRV represents heart rate variability; ROF represents rotation forest; And EL represents ensemble learning. |
[1] |
D. López, H. Vlamakis, R. Kolter, Biofilms, CSH Perspect. Biol., 13 (2010), 1–11. https://doi.org/10.1101/cshperspect.a000398 doi: 10.1101/cshperspect.a000398
![]() |
[2] |
L. Hall-Stoodley, J. W. Costerton, P. Stoodley, Bacterial biofilms: Form the natural environment to infectious diseases, Nat. Rev. Microbiol., 2 (2004), 95–108. https://doi.org/10.1038/nrmicro821 doi: 10.1038/nrmicro821
![]() |
[3] |
R. M. Donlan, Biofilms: Microbial life on surfaces, Emerg. Infect. Dis., 8 (2002), 881–890. https://doi.org/10.3201/eid0809.020063 doi: 10.3201/eid0809.020063
![]() |
[4] | O. Wanner, H. Eberl, E. Morgenroth, D. R. Noguera, C. Picioreanu, B. Rittmann, et al., Math. model. biofilms, IWA Publishing, 2006. |
[5] |
M. C. Van Loosdrecht, J. J. Heijnen, H. Eberl, J. Kreft, C. Picioreanu, Mathematical modelling of biofilm structures, Antonie van Leeuwenhoek, 81 (2002), 245–256. https://doi.org/10.1023/a:1020527020464 doi: 10.1023/a:1020527020464
![]() |
[6] |
Y. H. An, R. J. Friedman, Concise review of mechanisms of bacterial adhesion to biomaterial surfaces, J. Biomed. Mater. Res., 43 (1998), 338–348. https://doi.org/10.1002/(sici)1097-4636(199823)43:3<338::aid-jbm16>3.0.co;2-b doi: 10.1002/(sici)1097-4636(199823)43:3<338::aid-jbm16>3.0.co;2-b
![]() |
[7] |
A. Dumitrache, G. Wolfaardt, G. Allen, S. N. Liss, L. R. Lynd, Form and function of Clostridium thermocellum biofilms, Appl. Environ. Microbiol., 79 (2013), 231–239. https://doi.org/10.1128/AEM.02563-12 doi: 10.1128/AEM.02563-12
![]() |
[8] |
Z. W. Wang, S. H. Lee, J. G. Elkins, J. L. Morrell-Falvey, Spatial and temporal dynamics of cellulose degradation and biofilm formation by Caldicellulosiruptor obsidianis and Clostridium thermocellum, AMB Express, 1 (2011), 1–10. https://doi.org/10.1186/2191-0855-1-30 doi: 10.1186/2191-0855-1-30
![]() |
[9] |
Y. Rohanizadegan, S. Sonner, H. J. Eberl, Discrete attachment to a cellulolytic biofilm modeled by an Itô stochastic differential equation, Math. Biosci. Eng., 17 (2020), 2236–2271. https://doi.org/10.3934/mbe.2020119 doi: 10.3934/mbe.2020119
![]() |
[10] | B. K. Øksendal, Stochastic differential equations: an introduction with applications, Springer, 2013. |
[11] |
A. Carroll, C. Somerville, Cellulosic biofuels, Annu. Rev. Plant. Biol., 60 (2009), 165–182. https://doi.org/10.1146/annurev.arplant.043008.092125 doi: 10.1146/annurev.arplant.043008.092125
![]() |
[12] | M. H. Langholtz, B. J. Stokes, L. M. Eaton, 2016 billion-ton report: Advancing domestic resources for a thirving bioeconomy, Oak Ridge National Lab., 2016. |
[13] | J. G. Linger, A. Darzins, Consolidated Bioprocessing, Springer, 2013. |
[14] |
B. G. Schuster, M. S. Chinn, Consolidated bioprocessing of lignocellulosic feedstocks for ethanol fuel production, Bioenergy Res., 6 (2012), 416–435. https://doi.org/10.1007/s12155-012-9278-z doi: 10.1007/s12155-012-9278-z
![]() |
[15] |
V. Mbaneme-Smith, M. S. Chinn, Consolidated bioprocessing for biofuel production: Recent advances, Energy Emission Control Technol., 3 (2015), 23–44. https://doi.org/10.2147/EECT.S63000 doi: 10.2147/EECT.S63000
![]() |
[16] |
H. J. Eberl, E. M. Jalbert, A. Dumitrache, G. M. Wolfaardt, A spatially explicit model of inverse colony formation of cellulolytic biofilms, Biochem. Eng. J., 122 (2017), 141–151. https://doi.org/10.1016/j.bej.2017.03.007 doi: 10.1016/j.bej.2017.03.007
![]() |
[17] | L. N. Trefethen, Finite difference and spectral methods for ordinary and partial differential equations, Cornell University Department of Computer Science and Center for Applied Mathematics, 1996. |
[18] |
B. D'Acunto, V. Luongo, M. R. Mattei, Free boundary approach to modelling multispecies biofilms, Ric. Mat., 70 (2020), 267–284. https://doi.org/10.1007/s11587-020-00523-7 doi: 10.1007/s11587-020-00523-7
![]() |
[19] |
A. Mašić, H. J. Eberl, Persistence in a single species CSTR model with suspended flocs and wall attached biofilms, Bull. Math. Biol., 74 (2012), 1001–1026. https://doi.org/10.1007/s11538-011-9707-8 doi: 10.1007/s11538-011-9707-8
![]() |
[20] |
H. J. Gaebler, H. J. Eberl, A simple model of biofilm growth in a porous medium that accounts for detachment and attachment of suspended biomass and their contribution to substrate degradation, European J. Appl. Math., 29 (2018), 1110-1140. https://doi.org/10.1017/S0956792518000189 doi: 10.1017/S0956792518000189
![]() |
[21] |
H. J. Gaebler, H. J. Eberl, Thermodynamic inhibition in chemostat models: With an application to bioreduction of uranium, Bull. Math. Biol., 82 (2020), 1–25. https://doi.org/10.1007/s11538-020-00758-3 doi: 10.1007/s11538-020-00758-3
![]() |
[22] |
H. J. Eberl, D. F. Parker, M. C. Van Loosdrecht, A new deterministic spatio-temporal continuum model for biofilm development, J. Theor. Med., 3 (2001), 161–175. https://doi.org/https://doi.org/10.1080/10273660108833072 doi: 10.1080/10273660108833072
![]() |
[23] | H. J. Eberl, L. Demaret, A finite difference scheme for a degenerated diffusion equation arising in microbial ecology, Electron. J. Differential Equations, 15 (2007), 77–95. |
[24] | J. Monod, The growth of bacterial cultures, Annu. Rev. Microbiol., 3 (1949), 371–394. |
[25] |
A. Dumitrache, H. J. Eberl, D. G. Allen, G. M. Wolfaardt, Mathematical modeling to validate on-line CO2 measurements as a metric for cellulolytic biofilm activity in continuous-flow bioreactors, Biochem. Eng. J., 101 (2015), 55–67. https://doi.org/10.1016/j.bej.2015.04.022 doi: 10.1016/j.bej.2015.04.022
![]() |
[26] |
M. Ghasemi, H. J. Eberl, Time adaptive numerical solution of a highly degenerate diffusion-reaction biofilm model based on regularisation, J. Sci. Comput., 74 (2018), 1060–1090. https://doi.org/10.1007/s10915-017-0483-y doi: 10.1007/s10915-017-0483-y
![]() |
[27] | X. Han, P. E. Kloeden, Random ordinary differential equations and their numerical solution, 1st edition, Springer, 2017. |
[28] | M. Lefebvre, Applied Stochastic Processes, Springer, 2007. |
[29] |
A. Dumitrache, G. M. Wolfaardt, D. G. Allen, D. G. Liss, L. R. Lynd, Tracking the cellulolytic activity of Clostridium thermocellum biofilms, Biotechnol. Biofuels, 6 (2013), 1–15. https://doi.org/10.1186/1754-6834-6-175 doi: 10.1186/1754-6834-6-175
![]() |
[30] |
M. A. Efendiev, S. V. Zelik, H. J. Eberl, Existence and longtime behavior of a biofilm model, Commun. Pure Appl. Anal., 8 (2009), 509–531. https://doi.org/10.3934/cpaa.2009.8.509 doi: 10.3934/cpaa.2009.8.509
![]() |
[31] | W. Walter Ordinary differential equations, 1st edition, Springer, 1998. |
[32] | Y. Asai, E. Hermmann, P. E. Kloeden, Stiff integration of stiff random ordinary differential equations, Stoch. Anal. Appl., 31 (2013), 293–313. |
[33] | Y. Asai, P. E. Kloeden, Numerical schemes for random ODEs via stochastic differential equations, Commun. Appl. Anal., 17 (2013), 511–528. |
[34] |
L. Shampine, S. Thompson, J. Kierzenka, G. Byrne, Non-negative solutions of ODEs, Appl. Math. Comput., 170 (2005), 556–569. https://doi.org/10.1016/j.amc.2004.12.011 doi: 10.1016/j.amc.2004.12.011
![]() |
[35] | L. N. Trefethen, D. Bau, Numerical Linear Algebra, SIAM Society for Industrial and Applied Mathematics, 2000. |
[36] | J. Hughes, A mathematical model of discrete attachment to a cellulolytic biofilm using random DEs, M. Sc. Thesis, University of Guelph, 2021. https://hdl.handle.net/10214/26321 |
[37] |
J. Bezanson, A. Edelman, S. Karpinski, V. B. Shah, Julia: A fresh approach to numerical computing, SIAM Rev., 59 (2017), 65–98. https://doi.org/10.1137/141000671 doi: 10.1137/141000671
![]() |
[38] | R. V. Hogg, E. A. Tanis, D. L. Zimmermanm Probability and statistical inference, 9th edition, Pearson, 2020. |
[39] | W. Burger, M. Burge, Principles of digital image processing: core algorithms, Springer-Verlag, 2009. |
[40] | E. Jalbert, Comparison of a semi-implicit and a fully-implicit time integration method for a highly degenerate diffusion-reaction equations coupled with an ordinary differential equations, M. Sc. Thesis, University of Guelph, 2016. http://hdl.handle.net/10214/9448 |
[41] | K. Mitra, J. M. Hughes, S. Sonner, H. J. Eberl, J. D. Dockery, Travelling waves in a PDE-ODE coupled system with nonlinear diffusion, preprint, arXiv: 2202.07748. |
[42] | K. Eichinger, M. V. Gnann, C. Kuehn, Multiscale analysis for traveling-pulse solutions to the stochastic FitzHugh-Nagumo equations, preprint, arXiv: 2002.07234. |
[43] |
C. H. S. Hamster, H. J. Hupkes, Stability of traveling waves on exponentially long timescales in stochastic reaction-diffusion equations, SIAM J. Appl. Dyn. Syst., 19 (2020), 2469–2499. https://doi.org/10.1137/20M1323539 doi: 10.1137/20M1323539
![]() |
[44] |
C. H. S. Hamster, H. J. Hupkes, A general framework for stochastic traveling waves and patterns, with application to neural field equations, SIAM J. Appl. Dyn. Syst., 15 (2016), 195–234. https://doi.org/10.1137/15M102856X doi: 10.1137/15M102856X
![]() |
1. | Tomasz Pander, An Improved Approach for Atrial Fibrillation Detection in Long-Term ECG Using Decomposition Transforms and Least-Squares Support Vector Machine, 2023, 13, 2076-3417, 12187, 10.3390/app132212187 | |
2. | Yifan Sun, Jingyan Shen, Yunfan Jiang, Zhaohui Huang, Minsheng Hao, Xuegong Zhang, MMA-RNN: A multi-level multi-task attention-based recurrent neural network for discrimination and localization of atrial fibrillation, 2024, 89, 17468094, 105747, 10.1016/j.bspc.2023.105747 | |
3. | Hai Li, Mingjian Gao, Zhizhan Lin, Jian Peng, Liangzhen Xie, Junjie Ma, RETRACTED: Identify novel gene signatures in atrial fibrillation by comprehensive bioinformatics analysis, 2024, 46, 10641246, 5275, 10.3233/JIFS-234306 | |
4. | Jun Lei, Yuxi Zhou, Xue Tian, Qinghao Zhao, Qi Zhang, Shijia Geng, Qingbo Wu, Shenda Hong, A deep learning method for beat-level risk analysis and interpretation of atrial fibrillation patients during sinus rhythm, 2025, 100, 17468094, 107028, 10.1016/j.bspc.2024.107028 | |
5. | Amit Kumar Pandey, C. L. P. Gupta, 2024, Deep Learning Application in PQRS Segment Detection in ECG Data, 979-8-3503-7809-2, 1, 10.1109/ICEECT61758.2024.10738872 |
Rhythm distribution | N | AFp | AFf | Subject | Sum |
Dataset Ⅰ | 481 | 96 | 153 | 0–53 | 730 |
Dataset Ⅱ | 251 | 133 | 322 | 54–104 | 706 |
Sum | 732 | 229 | 475 | – | 1436 |
Rhythm distribution | N | AFp | AFf | Subject | Sum |
Training rhythms from Dataset Ⅰ | 0 | 77 | 0 | 3, 25, 31, 32, 39, 40, 48, | 77 |
Training rhythms from Dataset Ⅱ | 251 | 133 | 322 | 54–96, 99,100,102–104 | 706 |
Test rhythms from Dataset Ⅰ | 481 | 0 | 153 | 0–53 | 634 |
Test rhythms from Dataset Ⅱ | 0 | 19 | 0 | 97, 98,101 | 19 |
Sum | 732 | 229 | 475 | – | 1,436 |
The rhythm training set | 251 | 210 | 322 | – | 783 |
The rhythm test set | 481 | 19 | 153 | – | 653 |
Heartbeat distribution | Non-AF beats | AF beats | Sum |
A portion of the beat training data | 372,616 | 368,836 | 741,452 |
The whole beat training set | 629,425 | 627,188 | 1,256,613 |
The beat validation test set | 32,257 | 31,134 | 63,391 |
The beat test set | 104,381 | 15,615 | 119,996 |
Sum | 1,138,679 | 1,042,773 | 2,181,452 |
Layer | Output | Kernel | Stride | Padding | Total parameter | FLOPs |
Input | (200, 1) | – | – | – | 3,956,289 | 0.148 G |
Convolutional unit 1 | (100, 64) | 3 | 1 | same | ||
Convolutional unit 2 | (50,128) | 3 | 1 | same | ||
Convolutional unit 3 | (25,256) | 3 | 1 | same | ||
Convolutional unit 4 | (12,512) | 3 | 1 | same | ||
Convolutional unit 5 | (6,512) | 3 | 1 | same | ||
Flatten | (3072, ) | – | – | – | ||
FL 1 | (256, ) | – | – | – | ||
Dropout 6 | (256, ) | 0.5 | – | – | ||
FL 2 | (128, ) | – | – | – | ||
Dropout 7 | (128, ) | 0.5 | – | – | ||
Sigmoid | (1, ) | – | – | – |
Algorithm | Class | Metric | Overall Acc |
|||
Sen | PPV | F1 | Acc | |||
Rhythm classification | N | 0.998 | 0.998 | 0.997 | 0.998 | 0.986 |
AFp | 0.684 | 0.867 | 0.988 | 0.765 | ||
AFf | 0.987 | 0.962 | 0.988 | 0.974 | ||
Heartbeat classification | Non- AF beat | 0.906 | 0.913 | 0.908 | 0.909 | 0.908 |
AF beat | 0.910 | 0.903 | 0.908 | 0.907 |
Feature group | Num | Feature description | Symbol |
Time domain | 1 | Percentage of differences greater than 20ms between RR intervals. | pRR20 |
2 | Median absolute values of successive differences between RR intervals. | mavsd | |
3 | Ratio of the standard deviation divided by the mean of RR intervals. | rsdm | |
4 | Maximum heart rate. | maxHR | |
5 | Minimum heart rate. | minHR | |
Frequency domain | 6 | Low-frequency power. | LF |
7 | Normalized high-frequency power. | HFnorm | |
Non-linear domain | 8 | Modified cardiac sympathetic Index. | mdcsi |
Classifier | Before feature selection | After feature selection | ||||
Sen | PPV | Acc | Sen | PPV | Acc | |
LR | 0.985 | 0.845 | 0.974 | 0.989 | 0.871 | 0.982 |
Bayes | 1 | 1 | 1 | 1 | 1 | 1 |
KNN | 0.961 | 0.784 | 0.946 | 0.988 | 0.869 | 0.980 |
SVM | 0.890 | 0.942 | 0.986 | 0.990 | 0.920 | 0.991 |
DT | 1 | 1 | 1 | 1 | 1 | 1 |
RF | 0.999 | 0.983 | 0.998 | 1 | 1 | 1 |
Existing method | Liu [8] | Petrenas [9] | Ganapathy [10] | Xin [11] | Our method | Our optimized method |
Feature | RR intervals | RR intervals, f wave presence, P wave absence, and noise level | RR intervals | HRV signal | RR intervals | |
Lead | Single lead | Two leads | Single lead | |||
FT | Covariance representation and kernel mapping | Fuzzy logic | Dynamic symbol assignment and covariance representation | Wavelet analysis and scale entropy calculation | Time domain, frequency domain, and non-linear domain representation | Time domain, frequency domain, non-linear domain representation, and feature selection |
Classifier | Kernel sparse coding and dictionary learning | Sliding window and threshed | KNN, SVM, RF, ROF, EL | SVM | SVM | DT |
length | 33 RR intervals | Finite length (about 2 min) | Finite length (1 min~5 min) | Fixed length (5 min) | Variable length (≥ 4 RR intervals) | Variable length (≥ 4 RR intervals) |
Class | AFp rhythm | Occult AFp rhythm | AFp and non-AF rhythm | Non-AF, AFp and other rhythm | AFp, AFf and non-AF rhythm | AFp, AFf and non-AF rhythm |
Note: FT represents feature transformation; HRV represents heart rate variability; ROF represents rotation forest; And EL represents ensemble learning. |
Rhythm distribution | N | AFp | AFf | Subject | Sum |
Dataset Ⅰ | 481 | 96 | 153 | 0–53 | 730 |
Dataset Ⅱ | 251 | 133 | 322 | 54–104 | 706 |
Sum | 732 | 229 | 475 | – | 1436 |
Rhythm distribution | N | AFp | AFf | Subject | Sum |
Training rhythms from Dataset Ⅰ | 0 | 77 | 0 | 3, 25, 31, 32, 39, 40, 48, | 77 |
Training rhythms from Dataset Ⅱ | 251 | 133 | 322 | 54–96, 99,100,102–104 | 706 |
Test rhythms from Dataset Ⅰ | 481 | 0 | 153 | 0–53 | 634 |
Test rhythms from Dataset Ⅱ | 0 | 19 | 0 | 97, 98,101 | 19 |
Sum | 732 | 229 | 475 | – | 1,436 |
The rhythm training set | 251 | 210 | 322 | – | 783 |
The rhythm test set | 481 | 19 | 153 | – | 653 |
Heartbeat distribution | Non-AF beats | AF beats | Sum |
A portion of the beat training data | 372,616 | 368,836 | 741,452 |
The whole beat training set | 629,425 | 627,188 | 1,256,613 |
The beat validation test set | 32,257 | 31,134 | 63,391 |
The beat test set | 104,381 | 15,615 | 119,996 |
Sum | 1,138,679 | 1,042,773 | 2,181,452 |
Layer | Output | Kernel | Stride | Padding | Total parameter | FLOPs |
Input | (200, 1) | – | – | – | 3,956,289 | 0.148 G |
Convolutional unit 1 | (100, 64) | 3 | 1 | same | ||
Convolutional unit 2 | (50,128) | 3 | 1 | same | ||
Convolutional unit 3 | (25,256) | 3 | 1 | same | ||
Convolutional unit 4 | (12,512) | 3 | 1 | same | ||
Convolutional unit 5 | (6,512) | 3 | 1 | same | ||
Flatten | (3072, ) | – | – | – | ||
FL 1 | (256, ) | – | – | – | ||
Dropout 6 | (256, ) | 0.5 | – | – | ||
FL 2 | (128, ) | – | – | – | ||
Dropout 7 | (128, ) | 0.5 | – | – | ||
Sigmoid | (1, ) | – | – | – |
Algorithm | Class | Metric | Overall Acc |
|||
Sen | PPV | F1 | Acc | |||
Rhythm classification | N | 0.998 | 0.998 | 0.997 | 0.998 | 0.986 |
AFp | 0.684 | 0.867 | 0.988 | 0.765 | ||
AFf | 0.987 | 0.962 | 0.988 | 0.974 | ||
Heartbeat classification | Non- AF beat | 0.906 | 0.913 | 0.908 | 0.909 | 0.908 |
AF beat | 0.910 | 0.903 | 0.908 | 0.907 |
Feature group | Num | Feature description | Symbol |
Time domain | 1 | Percentage of differences greater than 20ms between RR intervals. | pRR20 |
2 | Median absolute values of successive differences between RR intervals. | mavsd | |
3 | Ratio of the standard deviation divided by the mean of RR intervals. | rsdm | |
4 | Maximum heart rate. | maxHR | |
5 | Minimum heart rate. | minHR | |
Frequency domain | 6 | Low-frequency power. | LF |
7 | Normalized high-frequency power. | HFnorm | |
Non-linear domain | 8 | Modified cardiac sympathetic Index. | mdcsi |
Classifier | Before feature selection | After feature selection | ||||
Sen | PPV | Acc | Sen | PPV | Acc | |
LR | 0.985 | 0.845 | 0.974 | 0.989 | 0.871 | 0.982 |
Bayes | 1 | 1 | 1 | 1 | 1 | 1 |
KNN | 0.961 | 0.784 | 0.946 | 0.988 | 0.869 | 0.980 |
SVM | 0.890 | 0.942 | 0.986 | 0.990 | 0.920 | 0.991 |
DT | 1 | 1 | 1 | 1 | 1 | 1 |
RF | 0.999 | 0.983 | 0.998 | 1 | 1 | 1 |
Existing method | Liu [8] | Petrenas [9] | Ganapathy [10] | Xin [11] | Our method | Our optimized method |
Feature | RR intervals | RR intervals, f wave presence, P wave absence, and noise level | RR intervals | HRV signal | RR intervals | |
Lead | Single lead | Two leads | Single lead | |||
FT | Covariance representation and kernel mapping | Fuzzy logic | Dynamic symbol assignment and covariance representation | Wavelet analysis and scale entropy calculation | Time domain, frequency domain, and non-linear domain representation | Time domain, frequency domain, non-linear domain representation, and feature selection |
Classifier | Kernel sparse coding and dictionary learning | Sliding window and threshed | KNN, SVM, RF, ROF, EL | SVM | SVM | DT |
length | 33 RR intervals | Finite length (about 2 min) | Finite length (1 min~5 min) | Fixed length (5 min) | Variable length (≥ 4 RR intervals) | Variable length (≥ 4 RR intervals) |
Class | AFp rhythm | Occult AFp rhythm | AFp and non-AF rhythm | Non-AF, AFp and other rhythm | AFp, AFf and non-AF rhythm | AFp, AFf and non-AF rhythm |
Note: FT represents feature transformation; HRV represents heart rate variability; ROF represents rotation forest; And EL represents ensemble learning. |