Salih, Hamadamin, and Aziz:

1. INTRODUCTION

An electrocardiogram (ECG) is a non-invasive diagnostic method that detects variations in the electrical activity of the heart over time by graphically measuring the heart’s rhythm and electrical activity [1]. Hence, it is vital to obtain and track ECG signals for early detection of diseases such as arrhythmia (ARR) and CHF [2]. Therefore, the automatic ECG signal classification of the latter ARR is worth studying field. The four main stages in a standard Computer-Aided Design system diagnosis involves: Preprocessing of signals, extraction of specific features, collection of significant features, and classification [3]. For classification problems, significant feature vectors, for example, continue to be the main and appropriate means of signal depiction. Many researchers from various fields who are involved in data modeling and classification are engaging to solve feature extraction problems [4]. The discrete wavelet transform (DWT) is especially useful in the fields of signal/image processing, such as denoizing, compression, and estimation [5]. However, in terms of time localization, it is unable to produce an ideal discrete-time basis [6], [7]. The Slantlet transform (SLT) has commonly been suggested as a better alternative to the classical DWT in terms of time localization [6]. Thus, in this article, SLT transform, has been suggested to extract a statistical feature, from ECG signals. The standard ECG signal is depicted in Fig. 1.

Fig. 1. Electrocardiogram signal parameters [1].

Electrocardiography is the recording of the electrical activity of the heart. The waveform is used to assess the rate and regularity of heartbeats, as well as the existence of any heart damage and the effects of medications or devices used to control the heart, such as a pacemaker. The ECG signals are weak (in mV) and have a broad frequency range (0.05–100 Hz), with the bulk of the useful information found in the 0.5–45 Hz range [8], [9]. P wave, which states with atrial depolarization, is one of the numerous waveforms and features of ECG. P waves have a typical amplitude of 0.1–0.2 mV and a normal length of 60–80 ms. The QRS complex is ventricular depolarization with a typical amplitude of about 1 mv and a duration of 0.06–0.12 s [10], [11]. The aim of this paper is to extract features from ECG signals using SLT transforms-based statistical features. This task uses an ECG dataset to identify people into three groups: those with cardiac ARR, congestive heart failure (CHF), and those with normal sinus rhythm (NSR). The aim of this paper is to extract features from ECG signals using SLT transforms-based statistical features. This task uses an ECG dataset to identify people into three groups: those with cardiac ARR, CHF, and those with NSR.

Our approach is consisting of preprocessing, feature extraction, feature selection based on ANOVA test, and then training the input to three different types of NN-based classifiers which are support vector machine [SVM], Naive Bayes [NB], and K-Nearest Neighbor [KNN]. Chami et al. proposed a system for five classes of heartbeat categories classification for ECG ARR diagnosis based on a combination of DWT and higher order statistics feature extraction and entropy based feature selection methods along with SVM classifier [12]. Nahak et al. proposed a method for analyzing and classifying the three types of ECGs (namely ARR, CHF, and NSR). Feature representations from the ECG signal’s heart rate variability (HRV) were derived based on wavelet-based functions, as well as auto-regressive coefficients. After feature fusion with SVM, the highest accuracy of 93.33% for three-class classification was obtained (SVM) [13]. Daqrouq et al. proposed the employment of wavelet energy to characterize ECG signals and ARR. The percentage energy (PE) of terminal wavelet packet transform (WPT) sub signals was used in the analysis to derive wavelet-based features for CHF [14]. Singh et al. suggested a model for cardiac ARR diagnosis. Three filter-based feature selection methods were applied to the cardiac ARR dataset using three separate machine learning methods, and the best features were picked. Feature selection is a crucial preprocessing phase in identifying effective factors in the diagnosis of ARR patients. As a consequence, the underlying health causes for heart-related deaths may be established. The output of feature selection methods was evaluated using SVM and random forest. The random forest classifier obtained the highest accuracy of 85.58% percent [15]. Mütevelli et al. proposed a method for extracting features from ECG signals based on the frequency domain DWT method. To extract DWT features, wavelet packet analysis was used. Wavelet packet analysis’ benefit has been highlighted in that it decomposes all approximations and information at all levels to achieve complete sub-band decomposition. Each signal was subjected to a 4-level wavelet packet decomposition, yielding 16 sub-bands. However, since the approximation coefficients reflect the key characteristic of each heart signal, the approximation coefficients from the low frequency variable are preferred, and eight of these sub-bands were used. Then there are some statistical features are extracted from different wavelet sub-bands [8]. Haoren Wang et al. in this article, the effect of using a dual fully-connected neural network model for accurate heartbeat classification was investigated. The classes were normal beats (N), supraventricular ectopic beats (S), ventricular ectopic beats (V), fusion beats (F), and unknown beats (Q). The tests show that the proposed approach is effective at detecting ARRs [16]. Çınar and Tuncer et al. This study proposed a deep learning system with high precision and popularity for the classification of ECG signals with Regular Sinus Rhythm (NSR), Pathological ARR, and CHF. The proposed architecture was designed using a hybrid Alexnet-SVM. There are 192 ECG signals in total, with 96 ARR, 30 CHF, and 36 NSR signals available. The SVM and KNN algorithms were used to classify the classification efficiency of deep learning techniques, ARR, CHR, and NSR signals, and afterward the signals were classified in their raw form using the LSTM algorithm (Long Short Time Memory). The spectrograms of the signals are obtained using the Hybrid Alexnet-SVM algorithm. The Hybrid Alexnet-SVM algorithm is applied to the images after obtaining the spectrograms of the signals. The performance results revealed that their proposed deep learning architecture outperformed traditional machine learning classifiers [2]. Wady et al. point out the improvement in replacing traditional DWT for feature extraction with SLT calculated from neutrosophic set for images of brain tumor. Thus, a new composite NS-SLT model was proposed as a source for obtaining statistical texture features which was efficiently used for binary classifies a brain tumor malignancy [7].

2. methods

The main stages of the proposed system are depicted in the Fig. 2. The first stage is preprocessing, then features extraction, features normalization, features selection ECG signal classification and finally, performance evaluation.

Fig. 2. Proposed electrocardiogram classifier system.

2.1. Preprocessing

Three ECG signs, ARR, Normal Sinus, and CHF are investigated in this article. There are a total of 162 ECG signals of ARR data have taken from the MIT-BIH ARR Database on Physio.Net. 96 of them are used for ARR, 30 are regular sinus signals, and 36 are CHF signals. Fig. 2 provides an illustration of ECG signals for ARR, CHF, and normal sinus rate. The previously described database is preprocessed prior to introduce it to the classifiers; it is originally made up of 162 records with 65536 samples per record, so to increase the data and to reduce the processing time. Each record has been chucked into small segments each of 512 samples.

To ensure fair comparison, a 30 records (each with 10 sub-records of 512 sample) have been extracted from each of ECG signal the (ARR, CHF, and NSR). Eventually each category has 300 sub-records, leads to totally 900 sub-records which has been provided to the training stage (750 sub-records) and (250 sub-records) were used by testing stage the next subsections will describe the details of each block as depicted in Fig 2. However, Fig 3. displays signals samples for each category (ARR, NSR, CHF) proposed ARR, CHF, NSR, and selected randomly from the database signals.

Fig. 3. Electrocardiogram sample from each class (Arrhythmia, normal sinus rhythm, and congestive heart failure).

2.2. Features Extraction and Selection

The classification techniques usually start by the stage of extraction of relevant features [4]. For instance, SLT transform coefficients based statistical features extraction which defines the distribution signal energy in time and the frequency domains have been investigated in this work. Following DWT implementation, the SLT transform filter banks have a parallel structure. In several of these parallel divisions, DWT uses a product form of simple filters, and the filter bank “Slantlet” uses a similar structure in parallel. The part filter branches, on the other hand, do not have a product form of implementation, giving SLT an advantage. SLT will create a filter bank with each filter’s length in power of two, for a mathematical perspective of SLT transformation, consider a simplified representation of Fig. 3 for (l) scales. This results in a periodic output for the analysis filter bank and reduces the samples (2ⁱ−2) which support approaches one thirds, as (i) increases. The filters in scale (i) must be g_i (n), f_i (n), and h_i (n) to analyze the signal where each filter has an appropriate 2ⁱ⁺¹ support. For (l), the SLT filter bank uses (l) number of pairs of channels, that is, (2l) channels in total. The low pass h_i (n) filter is then combined with its adjacent f_i (n) filter, where a down sampling of 2ⁱ is followed by any filter. The channel pairs of each (l−1) constitute a g_i (n), followed by a down sampling by 2ⁱ⁺¹ and the down sample by a reversed time version i=1,2,3….,l−1. The following expressions are represented by: as the filters g_i (n), f_i (n), and h_i (n) implement linear forms in pieces [7], [17]:

To correctly classify ECG signals requires generation of the feature vector which contains features both in the time domain and the frequency domain.

The trace of amplitude for more than 550 SLT coefficients from each ECG category are illustrated by Fig.5

Fig. 4. The two-scale iterated D2 filter bank (on the left) and the two-scale Slantlet transform filter bank (on the right) (right-hand side) [17].

Fig. 5. Slantlet transform coefficients for classes (Arrhythmia, congestive heart failure, and normal sinus rhythm).

The SLT quantized coefficients are used to extract the statistical-spectral features such as mean, standard deviation, variance, entropy, maximum, minimum, kurtosis, momentum, median, skewness, and root mean square error values of each ECG signal. This technique results in reduction in the length of the feature vector used as an input to a classifier.

The original database for each class is a vector of 300 ECG signal sub-record, thus the initial dimension is {y1, y2., y300}. Then after by calculating statistical features from SLT coefficient a features vector of 14 dimensions (14-features) {S1, S2., S14}is generated for each element y_n (for each of the three class) [Table 1]. Each feature vector corresponds to a single point in the feature space. Points of the same class should be closer together, and points of different classes should be apart. A normalization has been applied on the input features sets before the feature selection stage. The features selection stage is considered as a final stage in feature extraction and processing procedure [18]. It is aimed to improve the performance of a classifier and achieve a minimum classification error. The analysis of variance (ANOVA) methodology has been used in this study to minimize the dimension of data based on its importance and variance while preserving as much information as possible. ANOVA is a useful technique for deciding if two or more sets of data vary statistically [7]. The ANOVA test with P-value of 10⁻³ selects 12 out of 14 input features vector dimension. Hence, two features are cancelled based on their P-value >10⁻³. The two features omitted are SLT based mean, and SLT based first momentum Fig. 6 shows the distributions of the sample of relevant feature form relevant feature space of dimension 12 (SLT entropy), whereas Fig. 7 shows on of the irrelevant features (SLT mean), from the figures, the difference between relevant and irrelevant features can be noticed easily

TABLE 1: Statistical features extracted from SLT coefficients.

Fig. 6. Distribution of sample of relevant feature.

Fig. 7. Distribution of sample of irrelevant feature.

2.3. ECG Classification

The SLT transform has been used to improve the performance of SVM, KNN with, K=3 [13] [13], NB classifiers [19]. For instance, each classifier has been tested with different number of fold (5-fold or 10-folds) cross validation to measure the performance of proposed features space.

2.4. Performance Tests

The proposed Ternary ECG classification system is assessed for classification task performance using a variety of metrics. These figures come from the confusion matrix that describes the classes. The confusion matrix is a table that is often used to calculate the performance of a class predictor or classification model on a set of test data for which the true/actual values are unknown Accuracy is the proportion of correctly classified predictions (i.e., true positive and true negative) over the total number of cases examined.

Where TP=true positive, TN=true negative, FP=false positive and FN= false negative. The accuracy is defined as the percentage of positive class predictions that are actually positive class predictions. Precision is a measure of how accurately target areas are extracted when compared to the ground truth.

The recall (also known as sensitivity or true positive rate, TPR) is the proportion of positive class forecasts to the total number of positively classified units. The memory is a measure of how well the extracted target represents the ground truth.

The F-score (also known as the F-measure) is a metric for evaluating the performance of problems with binary labels and different classes. The harmonic mean of macro-precision and macro-recall is the macro F-score. High macro F-scores indicate that the system performs well across all classes, while low macro F-scores indicate that classes are poorly predicted [20].

3. Results and Discussion

The performance was calculated based on proposed statistical-features derived from SLT transform coefficients from more than 900 ECG signals to identify automatically the class of the ECG signal. NB, SVM, and KNN were used as a classification method. The Confusion Matrix for is shown in the diagram below. Fig. 8 shows the confusion matrix and the accuracy for the SVM classifier. It is obvious that the SVM classifier along with SLT transform features outperform the other proposed classifiers KNN and NB. For instance, a recognition accuracy of 99.2593% has been attained.

Fig. 8. Confusion matrix and accuracy for support vector Machine-NN.

Fig. 9 depicts the confusion matrix and the accuracy for the SLT transform based features with KNN classifier.

Fig. 9. Confusion matrix and accuracy for k-nearest neighbor-NN

It is clear that there is a big gap between the performances of this classifier which is 87.037% as compared to the SVM classifier along with SLT transform features. Fig. 10 gives the confusion matrix of the SLT features combined with NB classifier which gives very poor classification accuracy result of 78.5185%, which results on unrecommend it as a good choice with the proposed feature extraction scheme.

Fig. 10. Confusion matrix and accuracy for Naive Bayes-NN.

Fig. 11 gives the accuracy (AUC) of the proposed classifiers integrated with the proposed scenario of feature extraction with two cross–validations (5-folds and 10-folds), its also clear that increasing cross-validation from 5 to 10 folds has little bad impact on the accuracy results for the three proposed classifiers.

Fig. 11. Area under the curve of three classes two cross-validations.

The same fact has been concluded by investigating other performance measures, for example, precision depicted in Fig. 12, recall (sensitivity) showed in Fig. 13, and finally F1-score performance results depicted in Fig. 14 below. Thus, increasing the number of folds has mitigated the performance of classification for all proposed classifiers.

Fig. 12. Precision of three classes two cross-validations.

Fig. 13. Recall of three classes two cross-validations.

Fig. 14. F1-score of three classes two cross-validations.

As a final tool for proposed system performance evaluation a comparison has been made with the proposed system and some state of art schemes illustrated by Table 2. Thus, Singh et al. [15] who proposed a model for cardiac ARR diagnosis three separate machine learning approaches were used in this model to pick features (filter-based feature selection) from a cardiac ARR dataset. The highest accuracy of 85.58% percent was obtained with the random forest classifier using the gain ratio feature selection approach with a subset of 30 features, according to the experimental study. Hussain et al. [21] proposed a classification system based on SNN, KNN, and decision tree classification had achieved accuracy up to 97%. Nahak et al. [13] used wavelet transform fused features with auto-regression model was able finally to attain accuracy up to 93.3%.

TABLE 2: Literature comparison.

4. Conclusion

The main goal of this article was to create a combined features extraction and machine learning intelligent system that could automatically distinguish three different types of ECG signals: ARR, CHF, and regular sinus rhythm (NSR). The experiments were carried out on 90 ECG signal recordings (900 sub-records or segments) collected from a publicly accessible database. Fusion with wavelet and AR features improved the performance. Three classifiers were investigated (SVM, KNN, and NB) in this study, and ultimate accuracy of 99.256% was obtained from SVM classifier. The simulation results highly recommended SLT transform based statistical features extraction and showed that increasing the cross-validation folds from 5 to 10 has bad impact on the performance results from different metrics. Furthermore, the NB–NN classifier gave very poor results as compared to other two classifiers. Eventually the proposed technique outperform the accuracy attained by other similar literatures

5. ACKNOWLEDGMENT

The authors would like to thank Salahaddin University–Erbil for supporting this article.

REFERENCES

[1]. G. Sannino and G. De Pietro. “A deep learning approach for ECG-based heartbeat classification for arrhythmia detection“. Future Generation Computer Systems, vol. 86, pp. 446-455, 2018.

[2]. A. Çınar and S. A. Tuncer. “Classification of normal sinus rhythm, abnormal arrhythmia and congestive heart failure ECG signals using LSTM and hybrid CNN-SVM deep neural networks“. Computer Methods in Biomechanics and Biomedical Engineering, vol. 1, no. 1, pp. 1-12, 2020.

[3]. V. Jahmunah, S. L. Oh, J. K. En Wei, E. J. Ciaccio, K. Chua, T. Ru San, U. R. Acharya. “Computer-aided diagnosis of congestive heart failure using ECG signals a review“. Physica Medica, vol. 62, pp. 95-104, 2019.

[4]. A. Subasi. “Practical Guide for Biomedical Signals Analysis Using Machine Learning Techniques“. Academic Press, Cambridge, Massachusetts, 2019.

[5]. H. Khorrami and M. Moavenian. “A comparative study of DWT, CWT and DCT transformations in ECG arrhythmias classification“. Expert Systems With Applications, vol. 37, no. 8, pp. 5751-5757, 2010.

[6]. M. Maitra, A. Chatterjee and F. Matsuno. “A Novel Scheme for Feature Extraction and Classification of Magnetic Resonance Brain Images Based on Slantlet Transform and Support Vector Machine“. SICE Annual Conference, pp. 1130-1134, 2008.

[7]. S. H. Wady, R. Z. Yousif and H. R. Hasan. “A novel intelligent system for brain tumor diagnosis based on a composite neutrosophic-slantlet transform domain for statistical texture feature extraction“. BioMed Research International, vol. 2020, 8125392, 2020.

[8]. M. H. Mütevelli and S. Ergin. “The usage of statistical features in the approximation components of wavelet decomposition for ecg classification:A case study for standing, walking and single jump conditions“. Electronic Journal of Vocational Colleges, vol. 8, pp. 178-182, 2018.

[9]. M. R. Diniari and S. M. Isa. “Electrocardiogram classification for arrhythmia using convolutional neural network 2D and adabound optimizer“. The International Journal of Recent Technology and Engineering, vol. 8, no. 5, pp. 1277-1284, 2020.

[10]. S. Nibhanupudi, R. Youssif and C. Purdy. “Data-specific Signal Denoising Using Wavelets, with Applications to ECG Data“. International Midwest Symposium on Circuits and Systems. vol. 3, pp. 20-23, 2004.

[11]. S. K. Sahoo, A. K. Subudhi, B. Kanungo and S. K. Sabut. “Feature extraction of ECG signal based on wavelet transform for arrhythmia detection. International Conference on Electrical, Electronics, Signals, Communication and Optimization EESCO 2015, no. December 2018, 2015.

[12]. A. J. Chashmi and M. C. Amirani. “An efficient and automatic ECG arrhythmia diagnosis system using DWT and HOS features and entropy-based feature selection procedure“. The Journal of Electrical Bioimpedance, vol. 10, no. 1, pp. 47-54, 2019.

[13]. S. Nahak and G. Saha. “A Fusion Based Classification of Normal, Arrhythmia and Congestive Heart Failure in ECG“. 26^th The National Conference on Communications, pp. 1-6, 2020.

[14]. K. Daqrouq and A. Dobaie. “Wavelet based method for congestive heart failure recognition by three confirmation functions“. In:Computational and Mathematical Methods in Medicine. Taylor Francis Online, Milton Park, 2016.

[15]. N. Singh and P. Singh. “Cardiac arrhythmia classification using machine learning techniques“. In:Engineering Vibration, Communication and Information Processing. Springer, Berlin, Germany, 2019, pp. 469-480.

[16]. H. Wang, K. Lin, H. Shi and C. Qin. “A high-precision arrhythmia classification method based on dual fully connected neural network“. Biomedical Signal Processing and Control, vol. 58, 101874, 2020.

[17]. M. Maitra and A. Chatterjee. “A Slantlet transform based intelligent system for magnetic resonance brain image classification“. Biomedical Signal Processing and Control, vol. 1, no. 4, pp. 299-306, 2006.

[18]. S. O. Haji and R. Z. Yousif. “A novel run-length based wavelet features for screening thyroid nodule malignancy“. The Brazilian Archives of Biology and Technology, vol. 62, pp. 1-17, 2019.

[19]. H. Zhang. “The Optimality of Naive Bayes“. In:Proceedings of the 7^th International Florida Artificial Intelligence Research Society Conference, vol. 2, pp. 562-567, 2004.

[20]. M. Sokolova, N. Japkowicz and S. Szpakowicz. “Beyond accuracy, F-score and ROC:A family of discriminant measures for performance evaluation“. AAAI Workshop Technical Reports, vol. 6, pp. 24-29, 2006.

[21]. L. Hussain, I. A. Awan, W. Aziz, S. Saeed, A, Ali, F. Zeeshan and K. S. Kwak, et al. Detecting congestive heart failure by extracting multimodal features and employing machine learning techniques. Biomed Research International, vol. 2020, 4281243, 2020.

A Slantlet based Statistical Features Extraction for Classification of Normal, Arrhythmia, and Congestive Heart Failure in Electrocardiogram

Sawza Saadi Saeed, Raghad Zuhair Yousif