Citation: Qingxue Zhang, Dian Zhou, Xuan Zeng. Machine Learning-Empowered Biometric Methods for Biomedicine Applications[J]. AIMS Medical Science, 2017, 4(3): 274-290. doi: 10.3934/medsci.2017.3.274
[1] | Segun Akinola, Reddy Leelakrishna, Vijayakumar Varadarajan . Enhancing cardiovascular disease prediction: A hybrid machine learning approach integrating oversampling and adaptive boosting techniques. AIMS Medical Science, 2024, 11(2): 58-71. doi: 10.3934/medsci.2024005 |
[2] | Herbert F. Jelinek, Jemal H. Abawajy, Andrei V. Kelarev, Morshed U. Chowdhury, Andrew Stranieri . Decision trees and multi-level ensemble classifiers for neurological diagnostics. AIMS Medical Science, 2014, 1(1): 1-12. doi: 10.3934/medsci.2014.1.1 |
[3] | Herbert F. Jelinek, Jemal H. Abawajy, David J. Cornforth, Adam Kowalczyk, Michael Negnevitsky, Morshed U. Chowdhury, Robert Krones, Andrei V. Kelarev . Multi-layer Attribute Selection and Classification Algorithm for the Diagnosis of Cardiac Autonomic Neuropathy Based on HRV Attributes. AIMS Medical Science, 2015, 2(4): 396-409. doi: 10.3934/medsci.2015.4.396 |
[4] | Joy Qi En Chia, Li Lian Wong, Kevin Yi-Lwern Yap . Quality evaluation of digital voice assistants for diabetes management. AIMS Medical Science, 2023, 10(1): 80-106. doi: 10.3934/medsci.2023008 |
[5] | Fatemeh Dabiri, Sepideh Hajian, Abbas Ebadi, Farid Zayeri, Sedigheh Abedini . Sexual and reproductive health literacy of the youth in Bandar Abbas. AIMS Medical Science, 2019, 6(4): 318-325. doi: 10.3934/medsci.2019.4.318 |
[6] | Carol J. Farran, Olimpia Paun, Fawn Cothran, Caryn D. Etkin, Kumar B. Rajan, Amy Eisenstein, and Maryam Navaie . Impact of an Individualized Physical Activity Intervention on Improving Mental Health Outcomes in Family Caregivers of Persons with Dementia: A Randomized Controlled Trial. AIMS Medical Science, 2016, 3(1): 15-31. doi: 10.3934/medsci.2016.1.15 |
[7] | Gabriele Fici, Alessio Langiu, Giosuè Lo Bosco, Riccardo Rizzo . Bacteria classification using minimal absent words. AIMS Medical Science, 2018, 5(1): 23-32. doi: 10.3934/medsci.2018.1.23 |
[8] | Hend Amraoui, Faouzi Mhamdi, Mourad Elloumi . Survey of Metaheuristics and Statistical Methods for Multifactorial Diseases Analyses. AIMS Medical Science, 2017, 4(3): 291-331. doi: 10.3934/medsci.2017.3.291 |
[9] | Ayomide Abe, Mpumelelo Nyathi, Akintunde Okunade . Lung cancer diagnosis from computed tomography scans using convolutional neural network architecture with Mavage pooling technique. AIMS Medical Science, 2025, 12(1): 13-27. doi: 10.3934/medsci.2025002 |
[10] | Antonio Conti, Massimo Alessio . Proteomics for Cerebrospinal Fluid Biomarker Identification in Parkinsons Disease: Methods and Critical Aspects. AIMS Medical Science, 2015, 2(1): 1-6. doi: 10.3934/medsci.2015.1.1 |
Leveraging continuous advancements in electronics, communication and computers, the health industry is being quickly reshaped, allowing for more and more promising smart/mobile health solutions [1,2,3,4,5,6,7,8]. However, along with dramatically increased assistive devices, connection needs and big data, the security and privacy issue is also rising. It is highly necessary to pay enough attention to concerns such as how to provide confidential biomedicine applications, and how to effectively protect the sensitive data of patients. Biometric human identification is attracting tremendous attentions focusing on the security and privacy issue. As an emerging technology, biometric is more robust than traditional methods, such as token (identity card) and knowledge-based (username/password) ones which may be stolen or lost. Biometric is usually unique to individuals and highly difficult to be duplicated [9]. There are several categories of biometric modalities, such as the behavioral and physiological ones. The examples for the former one include gait, signature, face, etc. We take special interest in the latter one in this paper. The physiological signals can be easily collected by wearable computers which are part of the body sensor network. Then the signals acquired can be processed either by personal digital devices or cloud computing servers, for user identification purpose toward confidential smart digital health.
Among different physiological signals, the heart electrocardiogram (ECG) [10] and the brain electroencephalogram (EEG) [11] are two key modalities. The former one reflects the electrical behavior of the heart which is modulated by both sympathetic and parasympathetic nerves. To measure the ECG signal, ECG electrodes are used to detect the tiny electrical changes on the human body. These changes are generated by heart muscle's electrophysiological movements during depolarization and repolarization phases of one heartbeat. Therefore, it is hard to be duplicated and safer than traditional identification methods. In daily applications, the ECG signal can be easily collected by wearable computers and then sent to cellphone devices or other personal digital platforms. The ECG signal can be collected at any time since the live human body continuously generates heart electrical signals which are propagated to all the body parts.
EEG signal is another famous modality that can be used in many applications, such as seizure detection, sleep quality monitoring and emotion tracking. Here, we focus on its application in biometric human identification. EEG signal is also unique to individuals, and is somehow, more difficult to be duplicated considering the underlying signal generation mechanism is much more complicated. The EEG signal is usually collected by EEG electrodes placed on the head fixed by a specific EEG cap. There are also some easy-wearing EEG caps which only collect data from several locations with the scarification of accuracy. EEG signal is usually highly weak and of a log signal-to-noise ratio, therefore, many EEG electrodes are still used in practical application scenarios to obtain more redundant information for performance enhancement purpose. Of course, this may lower the wearability. But considering EEG signal is highly difficult to be duplicated, it is still attracting more and more attentions in human identification applications.
However, a thorough comparison between ECG and EEG-based biometric human identification lacks enough attention. This study is necessary, because ECG and EEG have different advantages and disadvantages in many aspects mentioned above, including sensor placement methods, signal quality, wearability of the signal acquisition device, difficulty of duplication, and so on. A detailed comparative analysis of these two modalities can effectively advance the practical application of an appropriated signal in pervasive assisted personal devices, for confidential biomedicine purpose.
Another aspect lacking enough study is the signal processing and machine learning algorithms used to perform the feature extraction and user classification tasks. Many different machine learning classifiers have been used to learn the features extracted and identify users on the fresh unseen data. However, which algorithm is more effective is not thoroughly compared and analyzed. Considering machine learners are usually built based on diverse mechanisms, the comparison is thus highly necessary. For example, the neural network is inspired by the brain structures and introduces a large number of neurons which are expected to generate neuron spikes reflecting the data patterns. The random forest, instead, is based on a totally different assumption, which includes forests of simple decision trees and applies an ensemble strategy to get an averaged classification result.
Focusing on above aspects lacking of study, in this paper, we investigate different advanced machine learning approaches on two physiological modalities, i.e., heart ECG and brain EEG, to gain insights on how to select an effective machine leaner and an appropriate signal for human identification purpose. The machine learning methods taken in consideration include Neural Network, K-nearest Neighbor, Bagging, Random Forest and AdaBoost [12]. Moreover, we also extract multi-domain features for the ECG signal, and the gamma-band spectral power ratio feature for the EEG signal. These features are fed to different machine learners to build the model on the training data, or to test the model on the unseen testing data. The identification rate is reported for different modalities and different machine learners. The discussion on the effectiveness of the features and classifiers is also given to point out future research directions.
The following article is organized as follows: section 2 gives the detailed methods in terms of datasets, feature extraction and classifiers; section 3 includes experimental results and discussion, and also comparison with state-of-the-art works; section 4 concludes the study.
To evaluate two physiological modalities, we have used two widely-applied public datasets, the Fantasia ECG dataset [13,14] and the UCI (University of California, Irvine) EEG dataset [15].
The Fantasia ECG dataset was acquired for aging trend study purpose and thus owns diverse ECG morphologies. This dataset has been used in many studies focusing different problems. Here, we choose 20 recordings from 20 subjects for human identification purpose.
The UCI EEG dataset was collected using the 64-electrode configuration. Each subject was exposed to the picture stimulation, and thus their EEG signals reflected this external visual input. EEG data corresponding to 20 nonalcoholic subjects are also used to evaluate the proposed algorithm.
There are two pre-processing steps, including filtering and signal segmentation. For ECG signal, a Butterworth band-pass filter (2 to 50 Hz) is applied to remove the low frequency baseline wander and the high frequency power line interference. Then ECG heartbeats are segmented based on the middle point between two heartbeat R peaks. Considering this study focuses on the effectiveness of the ECG heartbeat in terms of user identification, we directly use the ground truth heartbeat R peak locations to perform the heartbeat segmentation. This is because of the fact that ECG signal is usually acquired continuously, and we need to firstly segment the ECG heartbeats and then extract features from these heartbeats. But heartbeat detection algorithm may induce some faking heartbeats when the ECG signal is impacted by motion artifacts and other noise. In this study, the heartbeats selected do not include faking heartbeats, which is important for a fair comparison purpose with the ECG signal.
For EEG signal, a band-pass filter (30 to 50 Hz) is applied to the raw signal to get the gamma band signal, which corresponds to the visual stimulation-related signal fluctuation. In the UCI EEG dataset, each stimulation is stored in a single file, so there is no need to segment the data stream. It means we treat each data file as an instance, similar as a heartbeat.
Based on the ECG heartbeats or the EEG stimulation-related responses, we now can extract features to represent the signal characteristics that can be fed to the machine learners.
For ECG feature extraction, we have considered eleven features from multi-domains. An example of ECG signal with characteristic points is given in Figure 1 and the extracted features are listed in Table 1. Some of these features have been used in other areas, such as EMG signal processing and body activity recognition [16,17]. But we extract them here for biometric human identification purpose.
No. | Name |
1 | Skewness |
2 | Kurtosis |
3 | Auto-regression coefficient 1 |
4 | Auto-regression coefficient 2 |
5 | Auto-regression coefficient 3 |
6 | Auto-regression coefficient 4 |
7 | Median frequency |
8 | Mean frequency |
9 | Root mean square |
10 | Modified median frequency |
11 | Modified mean frequency |
(1) Skewness [16]
Skewness is used to measure the symmetry of the distribution of the samples around the R peak region. The definition is SKNS[X]=μ3σ3=E[(X−μ)3](E[(X−μ)2])3/2, where μ3 is the third central moment and σ is the standard deviation, of signal X.
(2) Kurtosis [16]
Kurtosis is used to measure the distribution of samples around the R peak region. The definition is Kurt[X]=μ4σ4=E[(X−μ)4](E[(X−μ)2])2, where μ4 is the fourth central moment and σ is the standard deviation, of signal X.
(3) Auto-regression coefficient 1
Coefficient 1 of the normalized four order auto-regression. (Coefficient 0 is not used since it is a constant)
(4) Auto-regression coefficient 2
Coefficient 1 of the normalized four order auto-regression.
(5) Auto-regression coefficient 3
Coefficient 1 of the normalized four order auto-regression.
(6) Auto-regression coefficient 4
Coefficient 1 of the normalized four order auto-regression.
(7) Median frequency
The median frequency of the power spectrum of the R peak region.
(8) Mean frequency
The mean frequency of the power spectrum of the R peak region.
(9) Root mean square
Root mean square value calculated based on the R peak region.
(10) Modified median frequency [17]
The modified median frequency is the frequency which equally divides the amplitude of the spectrum of the R peak region.
(11) Modified mean frequency [17]
The modified mean frequency is the frequency which averages amplitude of the spectrum of the R peak region.
For EEG feature extraction, it is difficult to apply similar methods used in ECG feature extraction, because the EEG signal is highly noise (which will be shown in the result section) and it is hard to distinguish meaning morphologies. Therefore, we have introduced the gamma-band spectral power ratio feature [18], defined in (1-5):
Sc,t=Sc,t∗Sc,t,∀c∈[1,64],∀t∈[1,256](1)
SSSc=T∑t=1Sc,t,∀c,T=256 (2)
FSc,t=Fc,t∗Fc,t,∀c,∀t (3)
FSSc=T∑t=1FSc,t,∀c (4)
GRc=FSSc/SSSc,∀c (5)
where,
Sc,t: the voltage value of the t−th sample in the c−th channel, from the unfiltered raw EEG data,
Fc,t: the voltage value of the t−th sample in the c−th channel, from the band-pass filtered EEG data,
SSc,t: the energy of the t−th sample in the c−th channel, from the unfiltered raw EEG data,
FSc,t: the energy of the t−th sample in the c−th channel, from the filtered EEG data,
SSSc: the energy of the t−th channel, from the unfiltered raw EEG data,
FSSc: the energy of the t−th channel, from the filtered EEG data,
GRc: gamma band energy ratio over the raw data, for the c−th channel.
Therefore, we can get GRc which is channel-wise gamma band energy ratio over the raw data. This feature vector (64 values) is expected to give the signal strength in the gamma band which is correlated with the visual stimulation [18].
To extract the features, MATLAB is used, which performs the data segmentation, filtering, feature calculation and.csv file generation which will be used by the RStudio [19] to perform the classification later.
After extracting features for each instance, we now can train the machine learning classifiers to learn the model parameters based on the training data. Considering we have extracted 11 features for the ECG signal and 64 features (64 channels) for the EEG signal, we thus choose more ECG instances for algorithm evaluation for fairness purpose. 300 heartbeats are chosen for each subject and in total, there are 6000 instances for 20 subjects.
For EEG signal, there are in total 1137 instances for 20 subjects and each subject has around 60 measurements (each measurement includes 64 time series corresponding to 64 EEG signal channels).
We have introduced the 10-fold cross validation method to test the generalization ability of the trained models [20], and the averaged performance (accuracy) is reported. We have extracted the features directly from the database, therefore there is no null or redundant data. Null or redundant data may exist when using some databases which are already extracted features. Five different classifiers are evaluated in this study, including Neural Network, K-nearest Neighbor, Bagging, Random Forest and AdaBoost, and the identification rate over two datasets is given.
The neural network [12] is inspired by the brain structures where layer-wise neurons are organized in a way that can represent high-level abstraction. To accelerate the training process, we have used a four-layer neural network, with one input layer, two hidden layers and one output layer. For ECG, the number of nodes for four layers are 11 (11 features), 50, 50 and 20 (20 subjects), respectively. And for EEG, they are 64,100, 50 and 20, respectively, where more nodes for the second layer is used to more effectively learn patterns from 64 features.
The K-nearest neighbor (KNN) classifier [12] is a non-parametric method, which analyzes the distribution of the instances to build the model. KNN does not make any assumption on the data distribution which is its advantage since practical data does not obey typical distribution assumptions. So KNN usually generates nonlinear boundaries for different groups. K is selected as 20 here since we have 20 subjects in each dataset.
The bagging classifier (Bootstrap Aggregating) [12] can effectively decrease the variance of the prediction, because it can generate more data based on the original dataset to train the classifier. It is actually a special case of model averaging. Here we build the bagging classifier based on the decision trees and get the smoothed results.
The random forest [12] is based on a totally different assumption, which includes forests of simple decision trees and applies an ensemble strategy to get an averaged classification result. Random forest has been used in many areas and demonstrated effective classification ability. Here we choose 100 trees to build the random forest.
The Adaboost classfier (Adaptive Boosting) [12] is designed to be able to get a weighted sum results from weak learners. It is adaptive since it can adjust the selection of training samples based on historical training performance (previous iterations). It means that it can learn to select samples more useful to improve the prediction performance of the model.
In this section, we give detailed experimental results of pre-processing, feature extraction and classification.
The ECG data after band-pass filtering is shown in Figure 2. Clear ECG morphology can be observed, leveraging a relatively high signal-to-noise ratio. Therefore it should be able to effectively extract multi-domain features from ECG heartbeats with a good signal quality.
The EEG signal is much noisier than the ECG signal. To clearly demonstrate the signal quality, both raw and band-pass filtered signals are given, as shown in Figure 3 and 4, respectively. In Figure 3, the EEG data for one instance (64 × 256 points) is plotted where it is hard to find out potential patterns. After filtering, Figure 4 still does not show observable signal behaviors. This is to say, even with 64 channels, it is still highly challenging to analyze the behavior from the ECG signal. This is the major reason we choose the gamma band spectral power ratio as the feature.
We have visualized the extracted features only for the EEG signal because we are interested in whether the gamma band spectral feature can reflect some meaningful behaviors of the EEG signal. An example is given in Figure 5, where 64 features are plotted for one EEG instance and there are several spikes corresponding to some specific EEG channels.
If we further plot all instances for one subject, as shown in Figure 6, we can find relatively consistent trend for these instances, with similar feature spikes. That means the gamma band spectral feature does reflect underlying brain behaviors related to visual stimulation. But at the same time, it is also worth noting that these instances are also very noisy and have a large variance. This indicates that EEG features may not be as effective as ECG features, which will be further introduced later.
Another thing worth noting is that the spikes in Figure 6 correspond to several EEG channels, which are connected to EEG electrodes placed on some specific locations of the head. This also gives some hints to remove some non-significant electrodes to enhance the wearability and power efficiency of the algorithm. But in this study, we keep all these features considering that EEG is highly noisy and we want to leverage as much information as possible to fully explore the potential of the EEG-based human identification method.
The performance of five advanced classifiers is summarized in Table 2. For ECG dataset, the random forest shows the best accuracy, followed by the Adaboost and neural network. For EEG dataset, all accuracy is no more than 90% because the signals are not as strong as the ECG signals and of much noise. The random forest still shows a best accuracy, 86%, followed by the neural network and the KNN.
Data | Classifier | 10-fold cross validation accuracy (%) | Averaged | |||||||||
ECG | NN | 94.2 | 90.8 | 90.2 | 93.8 | 91.3 | 94.0 | 91.2 | 92.7 | 92.3 | 92.3 | 92.3 |
KNN | 86.3 | 85.5 | 84.3 | 87.0 | 86.0 | 86.5 | 87.3 | 85.3 | 85.5 | 86.0 | 86.0 | |
Bagging | 93.0 | 91.0 | 90.2 | 91.7 | 89.3 | 93.0 | 93.3 | 92.7 | 92.7 | 91.7 | 91.9 | |
RF | 97.7 | 98.0 | 97.5 | 97.8 | 97.2 | 98.3 | 99.2 | 97.7 | 98.5 | 97.8 | 98.0 | |
Adaboost | 97.0 | 96.8 | 96.7 | 97.5 | 95.7 | 98.0 | 97.3 | 97.5 | 96.3 | 96.3 | 96.9 | |
EEG | NN | 85.1 | 77.2 | 86.8 | 86.8 | 88.6 | 80.7 | 83.3 | 89.5 | 84.2 | 88.3 | 85.1 |
KNN | 77.2 | 68.4 | 81.6 | 75.4 | 78.9 | 75.4 | 73.7 | 81.6 | 83.3 | 84.7 | 78.0 | |
Bagging | 72.8 | 63.2 | 64.9 | 67.5 | 65.8 | 69.3 | 70.2 | 66.7 | 63.2 | 64.0 | 66.7 | |
RF | 86.8 | 78.9 | 89.5 | 81.6 | 91.2 | 83.3 | 87.7 | 87.7 | 87.7 | 85.6 | 86.0 | |
Adaboost | 78.1 | 71.1 | 71.9 | 69.3 | 68.4 | 71.9 | 79.8 | 79.8 | 75.4 | 73.0 | 73.9 | |
NN: neural network; KNN: K-nearest neighbor; RF: random forest. |
Therefore, we can get two important insights here. Firstly, ECG signal is much more appropriate for human identification, leveraging a high signal-to-noise ratio, and EEG signal even with 64 channels cannot provide an identification rate higher than 90%. Secondly, the random forest classifier is the most powerful one compared with other four methods, leveraging the ensemble ability of many decision trees. The neural network can also provide relatively good performance because the brain-inspired neuron connections can learn the underlying patterns from data effectively. Overall, the ECG-based biometric identification method using the random forest machine learner is the best combination according to our study.
To further illustrate the effectiveness of the proposed ECG & RF-based method leveraging the extracted multi-domain features, we also compare our method with four state-of-the-art works as shown in Table 3. EEG-based method is not listed in Table 3 considering that its accuracy is much lower than ECG-based method and we only suggest ECG-based method for practical application scenarios. Our method owns a much higher identification rate than other works, demonstrating that the extracted multi-domain features and the random forest machine learner is very powerful in human subject identification. The other works did not compare different modalities and different machine learning classifiers, however, we compared EEG and ECG modalities and showed that ECG is superior to EEG leveraging a better signal quality, and we also compared different classifiers and demonstrate that the random forest more effective than other method.
In future, there are several directions to further enhance the study. Firstly, extracting more features and selecting critical ones may further enhance the accuracy and power efficiency; secondly, more datasets can also be introduced to evaluate the identification rate when more human subjects need to be distinguished; thirdly, the ensemble of different classifiers may also contribute to the performance enhancement, since different classifiers usually leverage different assumptions and their combination may help to cover more mechanisms behind the data.
Besides, EEG signal also shows different characteristics during different sleep stages [24,25,26,27,28], which may result in different user identification rates and can be further studied.
Currently available ECG and EEG datasets were not acquired at the same time. Although we tried to balance the selection of these two datasets to make the comparison fairer, we believe if we can collect ECG and EEG from same subjects at the same time, we can make the comparison more robust. Meanwhile, it is also necessary to consider more datasets in future work. Moreover, we will further evaluation more classifier configurations to explore their impact. Currently, we empirically selected the parameters for these classifiers.
In this paper, we have proposed the ECG & RF-based biometric human identification approach toward practical confidential biomedicine applications. Firstly, we have considered both ECG and EEG modalities and investigated their appropriateness in terms of human identification, by evaluating their signal quality, feature effectiveness and identification rate. Experimental results show that ECG is more robust than EEG levering a high signal-to-noise ratio and effectively extracted features. Secondly, we thoroughly compared five advanced machine learners and determined that the random forest classifier is much more powerful than other methods benefitting from its ensemble learning strategy. Specifically, for ECG & RF-based approach, a user identification rate as high as 98.0% is achieved, which is much higher than state-of-the-art works. This study is expected to show that properly selected biometric empowered by an effective machine learner owns a great potential, to enable confidential biomedicine applications in the era of smart digital health.
This research is supported partly by the Recruitment Program of Global Experts (the Thousand Talents Plan) and partly by National Natural Science Foundation of China (NSFC) research, No. 61574044,61376040 and 61574046.
All authors declare no conflicts of interest in this paper.
[1] | Lymberis A (2003) Smart wearable systems for personalised health management: current R&D and future challenges. In: Engineering in Medicine and Biology Society. Proceedings of the 25th Annual International Conference of the IEEE. IEEE 4: 3716-3719. |
[2] | Solanas A, Patsakis C, Conti M, et al. (2014) Smart health: a context-aware health paradigm within smart cities. IEEE Communications Magazine 52: 74-81. |
[3] | Birjandtalab J, Zhang Q, Jafari R (2015) A case study on minimum energy operation for dynamic time warping signal processing in wearable computers. In: Pervasive Computing and Communication Workshops (PerCom Workshops), 2015 IEEE International Conference: 415-420. |
[4] | Eriksson J, Vaka P, Shen F, et al. PerMoby'15: The Fourth IEEE International Workshop on the Impact of Human Mobility in Pervasive Systems and Applications, 2015-Program. |
[5] |
Zhang Q, Zhou D, Zeng X (2017) Highly wearable cuff-less blood pressure and heart rate monitoring with single-arm electrocardiogram and photoplethysmogram signals. Biomed Eng Online 16: 23. doi: 10.1186/s12938-017-0317-z
![]() |
[6] | Zhang Q, Zahed C, Nathan V, et al. (2015) An ECG dataset representing real-world signal characteristics for wearable computers. In: Biomedical Circuits and Systems Conference (BioCAS), 2015 IEEE. IEEE: 1-4. |
[7] | Zhang Q, Zhou D, Zeng X (2016) A Novel Framework for Motion-tolerant Instantaneous Heart Rate Estimation By Phase-domain Multi-view Dynamic Time Warping. IEEE Transactions on Biomedical Engineering. |
[8] |
Zhang Q, Zhou D, Zeng X (2016) A novel machine learning-enabled framework for instantaneous heart rate monitoring from motion-artifact-corrupted electrocardiogram signals. Physiol Meas 37: 1945-1967. doi: 10.1088/0967-3334/37/11/1945
![]() |
[9] | Unar JA, Seng WC, Abbasi A (2014) A review of biometric technology along with trends and prospects. Pattern Recogn 47: 2673-2688. |
[10] | Yao J, Wan Y (2008) A wavelet method for biometric identification using wearable ECG sensors. In: Medical Devices and Biosensors, 2008. ISSS-MDBS 2008. 5th International Summer School and Symposium on. IEEE: 297-300. |
[11] |
Paranjape RB, Mahovsky J, Benedicenti L, et al. (2001) The electroencephalogram as a biometric. In: Electrical and Computer Engineering, 2001. Canadian Conference on. IEEE 2: 1363-1366. doi: 10.1109/CCECE.2001.933649
![]() |
[12] | Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms. Cambridge university press. |
[13] | Iyengar N, Peng CK, Morin R, et al. (1996) Age-related alterations in the fractal scaling of cardiac interbeat interval dynamics. Am J Physiol-Reg I 71: R1078-R1084. |
[14] |
Goldberger AL, Amaral L, Glass L, et al. (2000) Physiobank, physiotoolkit, and physionet. Circulation. Discovery 101: e215-e220. doi: 10.1161/01.CIR.101.23.e215
![]() |
[15] | Begleiter H. UCI EEG Database. Available from: https://archive.ics.uci.edu/ml/datasets/EEG+Database. |
[16] | Bai J, Ng S (2005) Tests for skewness, kurtosis, and normality for time series data, J Bus Econ Stat 23: 49-60. |
[17] | Phinyomark A, Limsakul C, Phukpattaranont P (2009) A novel feature extraction for robust EMG pattern recognition. arXiv preprint arXiv:0912.3973, 2009. |
[18] |
Palaniappan R (2004) Method of identifying individuals using VEP signals and neural network. IEE P-SCI Meas Tech 151: 16-20. doi: 10.1049/ip-smt:20040003
![]() |
[19] | F. f. O. A. Statistics. RStudio, Available from: https://www.rstudio.com/. |
[20] | Machine learning: An artificial intelligence approach. Springer Science & Business Media, 2013. |
[21] | Tang X, Shu L (2014) Classification of electrocardiogram signals with RS and quantum neural networks. IJMUE 9:363-372. |
[22] | Lourenço A, Silva H, Fred A (2011) Unveiling the biometric potential of finger-based ECG signals. Comput Intell Neurosci 2011: 5. |
[23] | Ting CM, Salleh SH (2010) ECG based personal identification using extended kalman filter. In: Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Conference on. IEEE: 774-777. |
[24] |
Aboalayon KAI, Faezipour M, Almuhammadi WS, et al. (2016) Sleep Stage Classification Using EEG Signal Analysis: A Comprehensive Survey and New Investigation. Entropy 18: 272. doi: 10.3390/e18090272
![]() |
[25] |
Hassan AR, Bhuiyan MIH (2016). A decision support system for automatic sleep staging from EEG signals using tunable q-factor wavelet transform and spectral features. J Neurosci Methods 271: 107-118. doi: 10.1016/j.jneumeth.2016.07.012
![]() |
[26] |
Hassan AR, Subasi A (2016) Automatic identification of epileptic seizures from EEG signals using linear programming boosting. Comput Meth Prog Bio 136: 65-77. doi: 10.1016/j.cmpb.2016.08.013
![]() |
[27] |
Hassan AR, Haque MA (2017) An expert system for automated identification of obstructive sleep apnea from single-lead ECG using random under sampling boosting. Neurocomputing 235: 122-130. doi: 10.1016/j.neucom.2016.12.062
![]() |
[28] | Hassan AR (2016) Computer-aided obstructive sleep apnea detection using normal inverse Gaussian parameters and adaptive boosting. Biomed Signal Proces 29: 22-30. |
1. | Dustin Carrión-Ojeda, Rigoberto Fonseca-Delgado, Israel Pineda, Analysis of factors that influence the performance of biometric systems based on EEG signals, 2021, 165, 09574174, 113967, 10.1016/j.eswa.2020.113967 | |
2. | Leila Farsi, Siuly Siuly, Enamul Kabir, Hua Wang, Classification of Alcoholic EEG Signals Using a Deep Learning Method, 2021, 21, 1530-437X, 3552, 10.1109/JSEN.2020.3026830 | |
3. | Sumair Aziz, Muhammad Umar Khan, Zainoor Ahmad Choudhry, Afeefa Aymin, Adil Usman, 2019, ECG-based Biometric Authentication using Empirical Mode Decomposition and Support Vector Machines, 978-1-7281-2530-5, 0906, 10.1109/IEMCON.2019.8936174 | |
4. | Muhammad Umar Khan, Sumair Aziz, Sara Ibraheem, Aqsa Butt, Hira Shahid, 2019, Characterization of Term and Preterm Deliveries using Electrohysterograms Signatures, 978-1-7281-2530-5, 0899, 10.1109/IEMCON.2019.8936292 | |
5. | Dustin Carrion-Ojeda, Hector Mejia-Vallejo, Rigoberto Fonseca-Delgado, Pilar Gomez-Gil, Manuel Ramirez-Cortes, 2019, A method for studying how much time of EEG recording is needed to have a good user identification, 978-1-7281-5666-8, 1, 10.1109/LA-CCI47412.2019.9037054 | |
6. | Andrei V. Kelarev, Xun Yi, Hui Cui, Leanne Rylands, Herbert F. Jelinek, A survey of state-of-the-art methods for securing medical databases, 2018, 5, 2375-1576, 1, 10.3934/medsci.2018.1.1 | |
7. | Lu-di Wang, Wei Zhou, Ying Xing, Na Liu, Mahmood Movahedipour, Xiao-guang Zhou, A novel method based on convolutional neural networks for deriving standard 12-lead ECG from serial 3-lead ECG, 2019, 20, 2095-9184, 405, 10.1631/FITEE.1700413 | |
8. | Qingxue Zhang, Dian Zhou, Xuan Zeng, 2017, Hear the heart: Daily cardiac health monitoring using Ear-ECG and machine learning, 978-1-5386-1104-3, 448, 10.1109/UEMCON.2017.8249110 | |
9. | Yaroslav Voznyi, Mariia Nazarkevych, Volodymyr Hrytsyk, Nataliia Lotoshynska, Bohdana Havrysh, DESIGN OF BIOMETRIC PROTECTION AUTHENTIFICATION SYSTEM BASED ON K-AVERAGE METHOD, 2021, 4, 2663-4023, 85, 10.28925/2663-4023.2021.12.8595 | |
10. | Mariya Nazarkevych, Yaroslav Voznyi, Volodymyr Hrytsyk, Ivanna Klyujnyk, Bohdana Havrysh, Nataliia Lotoshynska, 2021, Identification of Biometric Images by Machine Learning, 978-1-6654-4296-1, 95, 10.1109/ELIT53502.2021.9501064 | |
11. | Ömer Kasim, Mustafa Tosun, Biometric Authentication from Photic Stimulated EEG Records, 2021, 35, 0883-9514, 1407, 10.1080/08839514.2021.1981660 | |
12. | Fatma Mallouli, Nesrine Khelifi, Aya Hellal, Imen Ferjani, Nada Chaabane, Mejda Dakhlaoui, Houda Chamakhi, 2023, Biometric Authentification Comparison: Toward Secure Human Recognition, 979-8-3503-6151-3, 1264, 10.1109/CSCI62032.2023.00206 | |
13. | Ilija Tanasković, Ljiljana B. Lazarević, Goran Knežević, Nikola Milosavljević, Olga Dubljević, Bojana Bjegojević, Nadica Miljković, CardioPRINT: Biometric identification based on the individual characteristics derived from the cardiogram, 2025, 265, 09574174, 126018, 10.1016/j.eswa.2024.126018 |
No. | Name |
1 | Skewness |
2 | Kurtosis |
3 | Auto-regression coefficient 1 |
4 | Auto-regression coefficient 2 |
5 | Auto-regression coefficient 3 |
6 | Auto-regression coefficient 4 |
7 | Median frequency |
8 | Mean frequency |
9 | Root mean square |
10 | Modified median frequency |
11 | Modified mean frequency |
Data | Classifier | 10-fold cross validation accuracy (%) | Averaged | |||||||||
ECG | NN | 94.2 | 90.8 | 90.2 | 93.8 | 91.3 | 94.0 | 91.2 | 92.7 | 92.3 | 92.3 | 92.3 |
KNN | 86.3 | 85.5 | 84.3 | 87.0 | 86.0 | 86.5 | 87.3 | 85.3 | 85.5 | 86.0 | 86.0 | |
Bagging | 93.0 | 91.0 | 90.2 | 91.7 | 89.3 | 93.0 | 93.3 | 92.7 | 92.7 | 91.7 | 91.9 | |
RF | 97.7 | 98.0 | 97.5 | 97.8 | 97.2 | 98.3 | 99.2 | 97.7 | 98.5 | 97.8 | 98.0 | |
Adaboost | 97.0 | 96.8 | 96.7 | 97.5 | 95.7 | 98.0 | 97.3 | 97.5 | 96.3 | 96.3 | 96.9 | |
EEG | NN | 85.1 | 77.2 | 86.8 | 86.8 | 88.6 | 80.7 | 83.3 | 89.5 | 84.2 | 88.3 | 85.1 |
KNN | 77.2 | 68.4 | 81.6 | 75.4 | 78.9 | 75.4 | 73.7 | 81.6 | 83.3 | 84.7 | 78.0 | |
Bagging | 72.8 | 63.2 | 64.9 | 67.5 | 65.8 | 69.3 | 70.2 | 66.7 | 63.2 | 64.0 | 66.7 | |
RF | 86.8 | 78.9 | 89.5 | 81.6 | 91.2 | 83.3 | 87.7 | 87.7 | 87.7 | 85.6 | 86.0 | |
Adaboost | 78.1 | 71.1 | 71.9 | 69.3 | 68.4 | 71.9 | 79.8 | 79.8 | 75.4 | 73.0 | 73.9 | |
NN: neural network; KNN: K-nearest neighbor; RF: random forest. |
No. | Name |
1 | Skewness |
2 | Kurtosis |
3 | Auto-regression coefficient 1 |
4 | Auto-regression coefficient 2 |
5 | Auto-regression coefficient 3 |
6 | Auto-regression coefficient 4 |
7 | Median frequency |
8 | Mean frequency |
9 | Root mean square |
10 | Modified median frequency |
11 | Modified mean frequency |
Data | Classifier | 10-fold cross validation accuracy (%) | Averaged | |||||||||
ECG | NN | 94.2 | 90.8 | 90.2 | 93.8 | 91.3 | 94.0 | 91.2 | 92.7 | 92.3 | 92.3 | 92.3 |
KNN | 86.3 | 85.5 | 84.3 | 87.0 | 86.0 | 86.5 | 87.3 | 85.3 | 85.5 | 86.0 | 86.0 | |
Bagging | 93.0 | 91.0 | 90.2 | 91.7 | 89.3 | 93.0 | 93.3 | 92.7 | 92.7 | 91.7 | 91.9 | |
RF | 97.7 | 98.0 | 97.5 | 97.8 | 97.2 | 98.3 | 99.2 | 97.7 | 98.5 | 97.8 | 98.0 | |
Adaboost | 97.0 | 96.8 | 96.7 | 97.5 | 95.7 | 98.0 | 97.3 | 97.5 | 96.3 | 96.3 | 96.9 | |
EEG | NN | 85.1 | 77.2 | 86.8 | 86.8 | 88.6 | 80.7 | 83.3 | 89.5 | 84.2 | 88.3 | 85.1 |
KNN | 77.2 | 68.4 | 81.6 | 75.4 | 78.9 | 75.4 | 73.7 | 81.6 | 83.3 | 84.7 | 78.0 | |
Bagging | 72.8 | 63.2 | 64.9 | 67.5 | 65.8 | 69.3 | 70.2 | 66.7 | 63.2 | 64.0 | 66.7 | |
RF | 86.8 | 78.9 | 89.5 | 81.6 | 91.2 | 83.3 | 87.7 | 87.7 | 87.7 | 85.6 | 86.0 | |
Adaboost | 78.1 | 71.1 | 71.9 | 69.3 | 68.4 | 71.9 | 79.8 | 79.8 | 75.4 | 73.0 | 73.9 | |
NN: neural network; KNN: K-nearest neighbor; RF: random forest. |
Methods | Signal | Datasets | ID Rate |
Proposed | ECG | 20 subjects | 98.0% |
Yao et al. [10] | ECG | 20 subjects | 91.5% |
Tan et al. [21] | ECG | 10 subjects | 91.7% |
Lourenco et al. [22] | ECG | 16 subjects | 94.3% |
Ting et al. [23] | ECG | 13 subjects | 87.5% |
ID: identification. |