Department of Physic, College of Science, Salahaddin University-Erbil, Erbil, Iraq
ABSTRACT
Intelligent and automated systems for diagnosing heart disease play a key role in treatment of heart disease and hence mitigating the possibility of heart disease, heart failure or sudden death. Thus, a Computer-Aided Design CAD system can provide a doctors with a clue about the category of patient heart disease, which might be Normal Sinus Rhythm, Abnormal Arrhythmia (ARR), and Congestive Heart Failure (CHF) electrocardiogram (ECG) signal. In this work a novel Slantlet transform (SLT) statistical features have been extracted and selected for 900 ECG segments taken from MIT-BIH ARR Database equally from three classes mentioned above for heart dieses classification through ECG signals. Based on the superiority of SLT in time localization as compared to the traditional wavelet transform, 12 out of 14 statistical features have been successfully passed the ANOVA test with P-value of 10−3. Then after, the relevant features are provided to three well-known classifiers (Support Vector Machine [SVM], K-nearest neighbor, and Naive Bayes). The performance tests show that Attaining 99.254% classification average AUC it can be achieved using SLT transform based features along with SVM classifier, which is a set of related supervised machine learning algorithm used for regression and classification with high generalization ability. It performs classification on two group problems. SVM classifier determines the best hyperplane which distinguishes between each positive and negative training sample. Index Terms: Electrocardiogram, Slantlet, Abnormal arrhythmia, Congestive heart failure, Normal sinus rhythm
An electrocardiogram (ECG) is a non-invasive diagnostic method that detects variations in the electrical activity of the heart over time by graphically measuring the heart’s rhythm and electrical activity [1]. Hence, it is vital to obtain and track ECG signals for early detection of diseases such as arrhythmia (ARR) and CHF [2]. Therefore, the automatic ECG signal classification of the latter ARR is worth studying field. The four main stages in a standard Computer-Aided Design system diagnosis involves: Preprocessing of signals, extraction of specific features, collection of significant features, and classification [3]. For classification problems, significant feature vectors, for example, continue to be the main and appropriate means of signal depiction. Many researchers from various fields who are involved in data modeling and classification are engaging to solve feature extraction problems [4]. The discrete wavelet transform (DWT) is especially useful in the fields of signal/image processing, such as denoizing, compression, and estimation [5]. However, in terms of time localization, it is unable to produce an ideal discrete-time basis [6], [7]. The Slantlet transform (SLT) has commonly been suggested as a better alternative to the classical DWT in terms of time localization [6]. Thus, in this article, SLT transform, has been suggested to extract a statistical feature, from ECG signals. The standard ECG signal is depicted in Fig. 1.
Fig. 1. Electrocardiogram signal parameters [1].
Electrocardiography is the recording of the electrical activity of the heart. The waveform is used to assess the rate and regularity of heartbeats, as well as the existence of any heart damage and the effects of medications or devices used to control the heart, such as a pacemaker. The ECG signals are weak (in mV) and have a broad frequency range (0.05–100 Hz), with the bulk of the useful information found in the 0.5–45 Hz range [8], [9]. P wave, which states with atrial depolarization, is one of the numerous waveforms and features of ECG. P waves have a typical amplitude of 0.1–0.2 mV and a normal length of 60–80 ms. The QRS complex is ventricular depolarization with a typical amplitude of about 1 mv and a duration of 0.06–0.12 s [10], [11]. The aim of this paper is to extract features from ECG signals using SLT transforms-based statistical features. This task uses an ECG dataset to identify people into three groups: those with cardiac ARR, congestive heart failure (CHF), and those with normal sinus rhythm (NSR). The aim of this paper is to extract features from ECG signals using SLT transforms-based statistical features. This task uses an ECG dataset to identify people into three groups: those with cardiac ARR, CHF, and those with NSR.
Our approach is consisting of preprocessing, feature extraction, feature selection based on ANOVA test, and then training the input to three different types of NN-based classifiers which are support vector machine [SVM], Naive Bayes [NB], and K-Nearest Neighbor [KNN]. Chami et al. proposed a system for five classes of heartbeat categories classification for ECG ARR diagnosis based on a combination of DWT and higher order statistics feature extraction and entropy based feature selection methods along with SVM classifier [12]. Nahak et al. proposed a method for analyzing and classifying the three types of ECGs (namely ARR, CHF, and NSR). Feature representations from the ECG signal’s heart rate variability (HRV) were derived based on wavelet-based functions, as well as auto-regressive coefficients. After feature fusion with SVM, the highest accuracy of 93.33% for three-class classification was obtained (SVM) [13]. Daqrouq et al. proposed the employment of wavelet energy to characterize ECG signals and ARR. The percentage energy (PE) of terminal wavelet packet transform (WPT) sub signals was used in the analysis to derive wavelet-based features for CHF [14]. Singh et al. suggested a model for cardiac ARR diagnosis. Three filter-based feature selection methods were applied to the cardiac ARR dataset using three separate machine learning methods, and the best features were picked. Feature selection is a crucial preprocessing phase in identifying effective factors in the diagnosis of ARR patients. As a consequence, the underlying health causes for heart-related deaths may be established. The output of feature selection methods was evaluated using SVM and random forest. The random forest classifier obtained the highest accuracy of 85.58% percent [15]. Mütevelli et al. proposed a method for extracting features from ECG signals based on the frequency domain DWT method. To extract DWT features, wavelet packet analysis was used. Wavelet packet analysis’ benefit has been highlighted in that it decomposes all approximations and information at all levels to achieve complete sub-band decomposition. Each signal was subjected to a 4-level wavelet packet decomposition, yielding 16 sub-bands. However, since the approximation coefficients reflect the key characteristic of each heart signal, the approximation coefficients from the low frequency variable are preferred, and eight of these sub-bands were used. Then there are some statistical features are extracted from different wavelet sub-bands [8]. Haoren Wang et al. in this article, the effect of using a dual fully-connected neural network model for accurate heartbeat classification was investigated. The classes were normal beats (N), supraventricular ectopic beats (S), ventricular ectopic beats (V), fusion beats (F), and unknown beats (Q). The tests show that the proposed approach is effective at detecting ARRs [16]. Çınar and Tuncer et al. This study proposed a deep learning system with high precision and popularity for the classification of ECG signals with Regular Sinus Rhythm (NSR), Pathological ARR, and CHF. The proposed architecture was designed using a hybrid Alexnet-SVM. There are 192 ECG signals in total, with 96 ARR, 30 CHF, and 36 NSR signals available. The SVM and KNN algorithms were used to classify the classification efficiency of deep learning techniques, ARR, CHR, and NSR signals, and afterward the signals were classified in their raw form using the LSTM algorithm (Long Short Time Memory). The spectrograms of the signals are obtained using the Hybrid Alexnet-SVM algorithm. The Hybrid Alexnet-SVM algorithm is applied to the images after obtaining the spectrograms of the signals. The performance results revealed that their proposed deep learning architecture outperformed traditional machine learning classifiers [2]. Wady et al. point out the improvement in replacing traditional DWT for feature extraction with SLT calculated from neutrosophic set for images of brain tumor. Thus, a new composite NS-SLT model was proposed as a source for obtaining statistical texture features which was efficiently used for binary classifies a brain tumor malignancy [7].
The main stages of the proposed system are depicted in the Fig. 2. The first stage is preprocessing, then features extraction, features normalization, features selection ECG signal classification and finally, performance evaluation.
Fig. 2. Proposed electrocardiogram classifier system.
Three ECG signs, ARR, Normal Sinus, and CHF are investigated in this article. There are a total of 162 ECG signals of ARR data have taken from the MIT-BIH ARR Database on Physio.Net. 96 of them are used for ARR, 30 are regular sinus signals, and 36 are CHF signals. Fig. 2 provides an illustration of ECG signals for ARR, CHF, and normal sinus rate. The previously described database is preprocessed prior to introduce it to the classifiers; it is originally made up of 162 records with 65536 samples per record, so to increase the data and to reduce the processing time. Each record has been chucked into small segments each of 512 samples.
To ensure fair comparison, a 30 records (each with 10 sub-records of 512 sample) have been extracted from each of ECG signal the (ARR, CHF, and NSR). Eventually each category has 300 sub-records, leads to totally 900 sub-records which has been provided to the training stage (750 sub-records) and (250 sub-records) were used by testing stage the next subsections will describe the details of each block as depicted in Fig 2. However, Fig 3. displays signals samples for each category (ARR, NSR, CHF) proposed ARR, CHF, NSR, and selected randomly from the database signals.
Fig. 3. Electrocardiogram sample from each class (Arrhythmia, normal sinus rhythm, and congestive heart failure).
The classification techniques usually start by the stage of extraction of relevant features [4]. For instance, SLT transform coefficients based statistical features extraction which defines the distribution signal energy in time and the frequency domains have been investigated in this work. Following DWT implementation, the SLT transform filter banks have a parallel structure. In several of these parallel divisions, DWT uses a product form of simple filters, and the filter bank “Slantlet” uses a similar structure in parallel. The part filter branches, on the other hand, do not have a product form of implementation, giving SLT an advantage. SLT will create a filter bank with each filter’s length in power of two, for a mathematical perspective of SLT transformation, consider a simplified representation of Fig. 3 for (l) scales. This results in a periodic output for the analysis filter bank and reduces the samples (2i−2) which support approaches one thirds, as (i) increases. The filters in scale (i) must be gi (n), fi (n), and hi (n) to analyze the signal where each filter has an appropriate 2i+1 support. For (l), the SLT filter bank uses (l) number of pairs of channels, that is, (2l) channels in total. The low pass hi (n) filter is then combined with its adjacent fi (n) filter, where a down sampling of 2i is followed by any filter. The channel pairs of each (l−1) constitute a gi (n), followed by a down sampling by 2i+1 and the down sample by a reversed time version i=1,2,3….,l−1. The following expressions are represented by: as the filters gi (n), fi (n), and hi (n) implement linear forms in pieces [7], [17]:
To correctly classify ECG signals requires generation of the feature vector which contains features both in the time domain and the frequency domain.
The trace of amplitude for more than 550 SLT coefficients from each ECG category are illustrated by Fig.5
Fig. 4. The two-scale iterated D2 filter bank (on the left) and the two-scale Slantlet transform filter bank (on the right) (right-hand side) [17].
Fig. 5. Slantlet transform coefficients for classes (Arrhythmia, congestive heart failure, and normal sinus rhythm).
The SLT quantized coefficients are used to extract the statistical-spectral features such as mean, standard deviation, variance, entropy, maximum, minimum, kurtosis, momentum, median, skewness, and root mean square error values of each ECG signal. This technique results in reduction in the length of the feature vector used as an input to a classifier.
The original database for each class is a vector of 300 ECG signal sub-record, thus the initial dimension is {y1, y2., y300}. Then after by calculating statistical features from SLT coefficient a features vector of 14 dimensions (14-features) {S1, S2., S14}is generated for each element yn (for each of the three class) [Table 1]. Each feature vector corresponds to a single point in the feature space. Points of the same class should be closer together, and points of different classes should be apart. A normalization has been applied on the input features sets before the feature selection stage. The features selection stage is considered as a final stage in feature extraction and processing procedure [18]. It is aimed to improve the performance of a classifier and achieve a minimum classification error. The analysis of variance (ANOVA) methodology has been used in this study to minimize the dimension of data based on its importance and variance while preserving as much information as possible. ANOVA is a useful technique for deciding if two or more sets of data vary statistically [7]. The ANOVA test with P-value of 10−3 selects 12 out of 14 input features vector dimension. Hence, two features are cancelled based on their P-value >10−3. The two features omitted are SLT based mean, and SLT based first momentum Fig. 6 shows the distributions of the sample of relevant feature form relevant feature space of dimension 12 (SLT entropy), whereas Fig. 7 shows on of the irrelevant features (SLT mean), from the figures, the difference between relevant and irrelevant features can be noticed easily
TABLE 1: Statistical features extracted from SLT coefficients.
Fig. 6. Distribution of sample of relevant feature.
Fig. 7. Distribution of sample of irrelevant feature.
The SLT transform has been used to improve the performance of SVM, KNN with, K=3 [13] [13], NB classifiers [19]. For instance, each classifier has been tested with different number of fold (5-fold or 10-folds) cross validation to measure the performance of proposed features space.
The proposed Ternary ECG classification system is assessed for classification task performance using a variety of metrics. These figures come from the confusion matrix that describes the classes. The confusion matrix is a table that is often used to calculate the performance of a class predictor or classification model on a set of test data for which the true/actual values are unknown Accuracy is the proportion of correctly classified predictions (i.e., true positive and true negative) over the total number of cases examined.
Where TP=true positive, TN=true negative, FP=false positive and FN= false negative. The accuracy is defined as the percentage of positive class predictions that are actually positive class predictions. Precision is a measure of how accurately target areas are extracted when compared to the ground truth.
The recall (also known as sensitivity or true positive rate, TPR) is the proportion of positive class forecasts to the total number of positively classified units. The memory is a measure of how well the extracted target represents the ground truth.
The F-score (also known as the F-measure) is a metric for evaluating the performance of problems with binary labels and different classes. The harmonic mean of macro-precision and macro-recall is the macro F-score. High macro F-scores indicate that the system performs well across all classes, while low macro F-scores indicate that classes are poorly predicted [20].
The performance was calculated based on proposed statistical-features derived from SLT transform coefficients from more than 900 ECG signals to identify automatically the class of the ECG signal. NB, SVM, and KNN were used as a classification method. The Confusion Matrix for is shown in the diagram below. Fig. 8 shows the confusion matrix and the accuracy for the SVM classifier. It is obvious that the SVM classifier along with SLT transform features outperform the other proposed classifiers KNN and NB. For instance, a recognition accuracy of 99.2593% has been attained.
Fig. 8. Confusion matrix and accuracy for support vector Machine-NN.
Fig. 9 depicts the confusion matrix and the accuracy for the SLT transform based features with KNN classifier.
Fig. 9. Confusion matrix and accuracy for k-nearest neighbor-NN
It is clear that there is a big gap between the performances of this classifier which is 87.037% as compared to the SVM classifier along with SLT transform features. Fig. 10 gives the confusion matrix of the SLT features combined with NB classifier which gives very poor classification accuracy result of 78.5185%, which results on unrecommend it as a good choice with the proposed feature extraction scheme.
Fig. 10. Confusion matrix and accuracy for Naive Bayes-NN.
Fig. 11 gives the accuracy (AUC) of the proposed classifiers integrated with the proposed scenario of feature extraction with two cross–validations (5-folds and 10-folds), its also clear that increasing cross-validation from 5 to 10 folds has little bad impact on the accuracy results for the three proposed classifiers.
Fig. 11. Area under the curve of three classes two cross-validations.
The same fact has been concluded by investigating other performance measures, for example, precision depicted in Fig. 12, recall (sensitivity) showed in Fig. 13, and finally F1-score performance results depicted in Fig. 14 below. Thus, increasing the number of folds has mitigated the performance of classification for all proposed classifiers.
Fig. 12. Precision of three classes two cross-validations.
Fig. 13. Recall of three classes two cross-validations.
Fig. 14. F1-score of three classes two cross-validations.
As a final tool for proposed system performance evaluation a comparison has been made with the proposed system and some state of art schemes illustrated by Table 2. Thus, Singh et al. [15] who proposed a model for cardiac ARR diagnosis three separate machine learning approaches were used in this model to pick features (filter-based feature selection) from a cardiac ARR dataset. The highest accuracy of 85.58% percent was obtained with the random forest classifier using the gain ratio feature selection approach with a subset of 30 features, according to the experimental study. Hussain et al. [21] proposed a classification system based on SNN, KNN, and decision tree classification had achieved accuracy up to 97%. Nahak et al. [13] used wavelet transform fused features with auto-regression model was able finally to attain accuracy up to 93.3%.
TABLE 2: Literature comparison.
The main goal of this article was to create a combined features extraction and machine learning intelligent system that could automatically distinguish three different types of ECG signals: ARR, CHF, and regular sinus rhythm (NSR). The experiments were carried out on 90 ECG signal recordings (900 sub-records or segments) collected from a publicly accessible database. Fusion with wavelet and AR features improved the performance. Three classifiers were investigated (SVM, KNN, and NB) in this study, and ultimate accuracy of 99.256% was obtained from SVM classifier. The simulation results highly recommended SLT transform based statistical features extraction and showed that increasing the cross-validation folds from 5 to 10 has bad impact on the performance results from different metrics. Furthermore, the NB–NN classifier gave very poor results as compared to other two classifiers. Eventually the proposed technique outperform the accuracy attained by other similar literatures
The authors would like to thank Salahaddin University–Erbil for supporting this article.
[1]. G. Sannino and G. De Pietro. “A deep learning approach for ECG-based heartbeat classification for arrhythmia detection“. Future Generation Computer Systems, vol. 86, pp. 446-455, 2018.
[2]. A. Çınar and S. A. Tuncer. “Classification of normal sinus rhythm, abnormal arrhythmia and congestive heart failure ECG signals using LSTM and hybrid CNN-SVM deep neural networks“. Computer Methods in Biomechanics and Biomedical Engineering, vol. 1, no. 1, pp. 1-12, 2020.
[3]. V. Jahmunah, S. L. Oh, J. K. En Wei, E. J. Ciaccio, K. Chua, T. Ru San, U. R. Acharya. “Computer-aided diagnosis of congestive heart failure using ECG signals a review“. Physica Medica, vol. 62, pp. 95-104, 2019.
[4]. A. Subasi. “Practical Guide for Biomedical Signals Analysis Using Machine Learning Techniques“. Academic Press, Cambridge, Massachusetts, 2019.
[5]. H. Khorrami and M. Moavenian. “A comparative study of DWT, CWT and DCT transformations in ECG arrhythmias classification“. Expert Systems With Applications, vol. 37, no. 8, pp. 5751-5757, 2010.
[6]. M. Maitra, A. Chatterjee and F. Matsuno. “A Novel Scheme for Feature Extraction and Classification of Magnetic Resonance Brain Images Based on Slantlet Transform and Support Vector Machine“. SICE Annual Conference, pp. 1130-1134, 2008.
[7]. S. H. Wady, R. Z. Yousif and H. R. Hasan. “A novel intelligent system for brain tumor diagnosis based on a composite neutrosophic-slantlet transform domain for statistical texture feature extraction“. BioMed Research International, vol. 2020, 8125392, 2020.
[8]. M. H. Mütevelli and S. Ergin. “The usage of statistical features in the approximation components of wavelet decomposition for ecg classification:A case study for standing, walking and single jump conditions“. Electronic Journal of Vocational Colleges, vol. 8, pp. 178-182, 2018.
[9]. M. R. Diniari and S. M. Isa. “Electrocardiogram classification for arrhythmia using convolutional neural network 2D and adabound optimizer“. The International Journal of Recent Technology and Engineering, vol. 8, no. 5, pp. 1277-1284, 2020.
[10]. S. Nibhanupudi, R. Youssif and C. Purdy. “Data-specific Signal Denoising Using Wavelets, with Applications to ECG Data“. International Midwest Symposium on Circuits and Systems. vol. 3, pp. 20-23, 2004.
[11]. S. K. Sahoo, A. K. Subudhi, B. Kanungo and S. K. Sabut. “Feature extraction of ECG signal based on wavelet transform for arrhythmia detection. International Conference on Electrical, Electronics, Signals, Communication and Optimization EESCO 2015, no. December 2018, 2015.
[12]. A. J. Chashmi and M. C. Amirani. “An efficient and automatic ECG arrhythmia diagnosis system using DWT and HOS features and entropy-based feature selection procedure“. The Journal of Electrical Bioimpedance, vol. 10, no. 1, pp. 47-54, 2019.
[13]. S. Nahak and G. Saha. “A Fusion Based Classification of Normal, Arrhythmia and Congestive Heart Failure in ECG“. 26th The National Conference on Communications, pp. 1-6, 2020.
[14]. K. Daqrouq and A. Dobaie. “Wavelet based method for congestive heart failure recognition by three confirmation functions“. In:Computational and Mathematical Methods in Medicine. Taylor Francis Online, Milton Park, 2016.
[15]. N. Singh and P. Singh. “Cardiac arrhythmia classification using machine learning techniques“. In:Engineering Vibration, Communication and Information Processing. Springer, Berlin, Germany, 2019, pp. 469-480.
[16]. H. Wang, K. Lin, H. Shi and C. Qin. “A high-precision arrhythmia classification method based on dual fully connected neural network“. Biomedical Signal Processing and Control, vol. 58, 101874, 2020.
[17]. M. Maitra and A. Chatterjee. “A Slantlet transform based intelligent system for magnetic resonance brain image classification“. Biomedical Signal Processing and Control, vol. 1, no. 4, pp. 299-306, 2006.
[18]. S. O. Haji and R. Z. Yousif. “A novel run-length based wavelet features for screening thyroid nodule malignancy“. The Brazilian Archives of Biology and Technology, vol. 62, pp. 1-17, 2019.
[19]. H. Zhang. “The Optimality of Naive Bayes“. In:Proceedings of the 7th International Florida Artificial Intelligence Research Society Conference, vol. 2, pp. 562-567, 2004.
[20]. M. Sokolova, N. Japkowicz and S. Szpakowicz. “Beyond accuracy, F-score and ROC:A family of discriminant measures for performance evaluation“. AAAI Workshop Technical Reports, vol. 6, pp. 24-29, 2006.
[21]. L. Hussain, I. A. Awan, W. Aziz, S. Saeed, A, Ali, F. Zeeshan and K. S. Kwak, et al. Detecting congestive heart failure by extracting multimodal features and employing machine learning techniques. Biomed Research International, vol. 2020, 4281243, 2020.