1Department of Computer Engineering, Kermanshah Branch, Islamic Azad University, Kermanshah, Iran; 2Department of Computer Engineering, Malayer Branch, Islamic Azad University, Malayer, Iran; 3Department of Information Technology, University of Human Development, Sulaymaniyah, Iraq
Addiction to narcotics is one of the greatest health challenges in today’s world which has become a serious threat for social, economic, and cultural structures and has ruined a part of an active force of the society and it is one of the main factors of growth of diseases such as HIV and hepatitis. Today, addiction is known as a disease and welfare organization, and many of the dependent centers try to help the addicts treat this disease. In this study, using data mining algorithms and based on data collected from opioid withdrawal applicants referring to welfare organization, a prediction model is proposed to predict the success of opioid withdrawal applicants. In this study, the statistical population is comprised opioid withdrawal applicants in a welfare organization. This statistical population includes 26 features of 793 instances including men and women. The proposed model is a combination of meta-learning algorithms (decorate and bagging) and J48 decision tree implemented in Weka data mining software. The efficiency of the proposed model is evaluated in terms of precision, recall, Kappa, and root mean squared error and the results are compared with algorithms such as multilayer perceptron neural network, Naive Bayes, and Random Forest. The results of various experiments showed that the precision of the proposed model is 71.3% which is superior over the other compared algorithms.
Index Terms: Addiction, Data Mining, Decision Tree, Meta-learning Algorithm
Today, addiction to narcotics is one of the main health challenges of the world resulting in serious threats for social, economic, and cultural structures destroying a part of the active force of the society. On the other hand, it is one of the main factors of growth of diseases such as HIV and hepatitis. According to the social analyzers, addiction to narcotics is one of the complicated problems of the current age resulting in many social damages and violations. In other words, the relationship of addiction with social issues is two-sided; on the one hand, addiction results in recession and degeneration of the society. On the other hand, it is a phenomenon originating from social, economic, and cultural issues .
Today, addiction is known as a disease and there are some centers for its treatment which have complete information about addicts. Therefore, despite the large volume of data, data mining can be used to explore knowledge in data, and its results can be used as the knowledge database of the decision support system to prevent and treat addiction. Data mining tools analyze data and explore data pattern which can be used in applications to determine the strategy for business, knowledge database, medical, and scientific studies. The gap between data and information has necessitated data mining tools to convert useless data into useful knowledge [2-4].
In this study, data mining techniques such as Neural Network, Bayesian Network, and Decision Tree are used to present prediction models to predict the success of opioid withdrawal applicants referring to a welfare organization. In this paper, also a hybrid prediction model comprised J48, Decorate, and Bagging algorithms are proposed for predicting the success of opioid withdrawal applicants to the welfare organizations. The statistical population of this study is comprised opioid withdrawal applicants referring to a welfare organization.
The rest of this paper is organized as follows: Section 2 reviews related works and in Section 3, we have introduced the dataset. Section 4 introduces the proposed model to predict the success of opioid withdrawal applicants, while Section 5 presents the simulation results. The paper is concluded in Section 6.
In Lu et al.  authors obtained data from Reddit, an online collection of forums, to gather insight into drug use/misuse using text snippets from user’s narratives. They used users’ posts to trained a binary classifier which predicts a user’s transitions from casual drug discussion forums to drug recovery forums. They also presented a Cox regression model that outputs likelihoods of such transitions. Furthermore, they founded that utterances of select drugs and certain linguistic features contained in one’s posts can help predict these transitions.
In Fan et al. , a novel framework named AutoDOA is proposed to automatically detect the opioid addicts from Twitter. The authors first introduced a structured heterogeneous information network (HIN) to model the users and posted tweets as well as their rich relationships. Then, a meta-path based mechanism is used to formulate similarity measures over users, and different similarities are aggregated using Laplacian scores. Finally, based on HIN and the combined meta-path, a classification model is proposed for automatic opioid addict detection to reduce the cost of acquiring labeled examples for supervised learning.
In Zhang et al. , an intelligent system named iOPU has been developed to automate the detection of opioid users from Twitter. Like Fan et al.  authors first introduced a HIN; then they used a meta-graph based technique to characterize the semantic relatedness over users. Furthermore, they have integrated content-based similarity and relatedness obtained by each meta-graph to formulate a similarity measure over users. Finally, they proposed a classifier combining different similarities based on different meta-graphs to make predictions.
In Kim  authors used a text mining approach to explore how opioid-related research themes have changed since 2000. The textual data were obtained from PubMed, and the research periods were divided into three periods. While a few topics appear throughout each period, many new health problems emerged as opioid abuse problems magnified. Topics such as HIV, methadone maintenance treatment, and world health organization appear consistently but diminish over time, while topics such as injecting drugs, neonatal abstinence syndrome, and public health concerns are rapidly increasing.
The study Kaur and Bawa  is aimed at uncovering and analyzing a range of data mining tools and techniques for optimally predicting the numerous medical diseases to endow the health-care section with high competence and more effectiveness. After preparing the dataset, data are loaded into the WEKA tool. Then, Naïve Bayes, Decision Tree (J48), multilayer perceptron (MLP), logistic regression are selected to build the prediction models. Data are then cross-validated using performance classifier measure; the results of each algorithm are then compared to each other.
In Rani , neural networks have been used to classify medical data sets. Back propagation error method with variable learning rate and acceleration has been used to train the network. To analyze the performance of the network, various training data have been used as input of the network. To speed up the learning process, parallelization is performed in each neuron at all output and hidden layers. Results showed that the multi-layer neural network is trained faster than a single-layer neural network with high classification efficiency.
In Shajahaan et al. , the application of decision trees in predicting breast cancer has been investigated. It has also analyzed the performance of conventional supervised learning algorithms, namely, Random tree, ID3, CART, C4.5, and Naive Bayes. Then, data are transferred to Rapid Miner data mining tool and breast cancer diagnosis for each sample in the test set is predicted with seven different algorithms which are Discriminant Analysis, Artificial Neural Networks (ANN), Decision Trees, Logistic Regression, Support Vector Machines, Naïve Bayes, and KNN. Results showed that random tree achieves higher accuracy in cancer prediction.
In Kaur and Bawa , the motive is proposing an expert system which can predict whether the person is addicted prone to drugs so as to control and aware every drug abuser as they can test repeatedly to cure them without hesitation. The proposed expert system is developed using decision tree ID3 algorithm.
In Ji et al. , a framework has been developed to predict potential risks for medical conditions as well as its progression trajectory to identify the comorbidity path. The proposed framework utilizes patients’ publicly available social media data and presents a collaborative prediction model to predict the ranked list of potential comorbidity incidences, and a trajectory prediction model to reveal different paths of condition progression.
In Salleh et al. , ANN algorithms have been used to propose a framework for relapse prediction using among drug addicts at Pusat Rawatan Inabah. The data collected will be mining through ANN algorithms to generate patterns and useful knowledge and then automatically classifying the relapse possibility. Authors have been mention that among the classification algorithms, ANN is one of the best algorithms to predict relapse among drug addicts.
The statistical population employed in this study is comprised the opioid withdrawal applicants of a welfare center. Samples include 793 applicants including men and women. The number of features in this dataset is 26. Table 1 shows the features of this dataset and the value of each feature.
TABLE 1 Features and their values in the dataset of study.
Here, some necessary preprocessing is provided on the dataset to increase the efficiency of the prediction models:
I. The output field of this dataset is “the number of referrals” which is obtained by a small change in the field of “the number of previous withdrawals.” The purpose of this study is to predict the number of referrals of the addicts to withdrawal centers. This knowledge helps us understand that an individual succeeds to withdraw after how many referrals to the opioid withdrawal centers. To calculate the output filed, it is sufficient to add a unit to the field of “the number of previous withdrawals.” Then, the output field is converted into four classes, as shown in Table 2.
II. The field “family income” includes a numerical value between 0 and 6000 USD. To increase the efficiency of the proposed prediction model, the values of this field are divided into 10 degrees. In other words, individuals are categorized into 10 groups in terms of family income. This categorization is represented in Table 3.
III. The field of “average cost of supplying the narcotics in the last week” also includes values in the range of 0.7–134 USD. All of these samples are also categorized into eight groups, as shown in Table 4.
IV. The field of “age” includes values between 17 and 70. In another categorization, all samples are categorized into six different groups, as given in Table 5.
V. Field of “side effects” includes the same value for all samples. Therefore, this feature cannot affect prediction results. Thus, it is eliminated from the dataset.
VI. Furthermore, the value of some features such as housing situation, amount of consumption, and consumption frequency is missing for some individuals which are called missing value, and random initialization is used to resolve this problem.
TABLE 2 Classifying the output field of the dataset.
TABLE 3 Categorizing the samples into 10 degrees in terms of “family income.
TABLE 4 Categorizing the samples in terms of “average cost of supplying the narcotics in the last week” into eight groups.
TABLE 5 Categorizing the samples in terms of “age” in six different groups.
The proposed prediction model for determining the type of addicts in terms of the classes presented in Table 2 is a hybrid model of J48, Decorate, and Bagging algorithms. The structure of the proposed hybrid model is as follows:
De c o r a t e ( B a g g i n g (J 4 8 ( Da t a s e t ) ))
This algorithm is the implementation of the C4.5 decision tree. In this algorithm, additional grafting branches are considered on a tree in a post-processing phase. The grafting process tries to capture some of the capabilities of ensemble methods such as Bagged and Boosted trees, while a single structure can be maintained. This algorithm identifies areas that are either empty or only contains misleading classified samples and Elores another (alternative) class [3,4].
Ensemble learning is to use a set of classifiers to learn partial solutions for a given problem and then integrate these solutions using some strategies to construct a final solution to the original problem. Recently, ensemble learning is one of the most popular fields in data mining and machine learning communities and has been applied successfully in many real classification applications. Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples (Decorate) is simple meta-learning that can use any strong learner as a base classifier to build diverse committees in a fairly straightforward strategy. The motivation for Decorate is based on the fact that to combine the outputs of multiple classifiers is only useful if they disagree on some inputs. The Decorate is designed to use additional artificially generated training data, and add different randomly constructed instances to the training set to generate highly diverse ensembles [2,15,16]. The Decorate can also be effectively used for the following :
I. Active learning, to reduce the number of training examples required to learn an accurate model;
II. Exploiting unlabeled data to improve accuracy in semi-supervised learning;
III. Combining both active and semi-supervised learning for improved results;
IV. Obtaining improved class membership probability estimates, to assist in cost-sensitive decision-making;
V. Reducing the error of regression methods; and
VI. Improving the accuracy of relational learners.
Furthermore, the advantages of Decorate are as follows :
Ensembles of classifiers are often more accurate than its component classifiers if the errors made by the ensemble members are uncorrelated. Decorate method reduces the correlation between ensemble members by training classifiers on oppositely labeled artificial examples. Furthermore, the algorithm ensures that the training error of the ensemble is always less than or equal to the error of the base classifier; which usually results in a reduction of generalization error; and
On average, combining the predictions of Decorate ensembles will improve on the accuracy of the base classifier.
This algorithm was proposed in 1994 by Leo Breiman to improve the classification by combining randomly generated training sets. This methodology is a meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. Variance is reduced and over-fitting is improved through the use of this algorithm. Bagging implicitly creates ensemble diversity by training classifiers on different subsets of the data. Although this method is used in the decision tree, it can be used in any kind of model. Bagging is a special case of model averaging approach. Bagging can be applied to the prediction of continuous values by taking the average value of each vote, rather than the majority [2,17,18]. The advantages of Bagging are as follows :
Bagging works well if the base classifiers are unstable;
It increased accuracy because it reduces the variance of the individual classifier;
Bagging seeks to reduce the error due to the variance of the base classifier; and
Noise-tolerant, but not so accurate.
In this section, we first present the simulation model for the proposed model and other algorithms. Then, the common evaluation metrics in data mining problems are introduced. Finally, the experiment results of the proposed model and some common algorithms are presented.
In this research, the Weka tool is used to perform pre-processing operations and construct the proposed predictive models. This software has been developed at Waikato University in New Zealand and is an open-source tool implemented by the object-oriented programming language. This tool includes several machine learning and data mining algorithms such as regression, classification, clustering, exploring association rules, pre-processing tools (filters), and selection methods for attributes.
Furthermore, to train and test the proposed model, K-fold (K = 10) method is employed. In this type of test, data are classified into K subsets. From these K subsets, a subset is used for test, and K-1 subsets are used for training. This procedure is repeated K-times and all data are once used for test and once for training. Finally, an average of these K times test is selected as the final estimation. In the K-fold method, the ratio of each class in each subset and the main set is the same .
One of the common tools used for evaluating classification algorithms is to employ the confusion matrix. As shown in Table 6, the confusion matrix includes the results of predictions of the classifier algorithm in four different classes including True positive, false negative, false positive, and true negative. True positive refers to the positive samples that were correctly labeled by the classifier. True negative refers to the negative samples that were correctly labeled by the classifier. False positive is an error in data reporting in which a test result improperly indicates the presence of a condition, such as a disease (the result is positive), when in reality it is not present. False negative is an error in which a test result improperly indicates no presence of a condition (the result is negative) when, in reality, it is present.
TABLE 6 The confusion matrix.
Considering the confusion matrix, the following measures can be defined and evaluated:
Precision is the fraction of retrieved instances that are relevant. This measure is calculated by Equation (1):
Accuracy is the proportion of true results (both true positives and true negatives) among the total number of cases examined. This measure is calculated by Equation (2):
Recall is the fraction of relevant instances that are retrieved. This measure is calculated by Equation (3):
F-Measure combines precision and recall (harmonic mean), which is calculated by Equation (4):
Root mean squared error (RMSE) is a frequently used measure of the differences between values predicted by a model or an estimator and the values observed. The RMSD represents the square root of the second sample moment of the differences between predicted values (pi) and observed values (ai) or the quadratic mean of these differences. This measure is calculated by Equation (5):
Mean absolute error (MAE) measures how far predicted values (pi) are away from observed values (ai) and is calculated by Equation (6):
Periment results in Fig. 1 show that the precision of the proposed model is 71.3% which has improved compared to J48, Random Forest, Naive Bayes, and MLP neural network with precisions of 66%, 65.3%, 51.2%, and 56.3%, respectively. Furthermore, the results show that using hybrid models based on meta-learning algorithms of Decorate and Bagging increases the precision of the prediction models. For instance, the precision of J48 alone is 66% while its combination with Decorate and Bagging algorithm increases the precision to 69% and 66.7%, respectively. In the proposed model, since features of J48, decorate and bagging are employed, higher precision is obtained.
Fig. 1. Comparing the proposed model with other algorithms in terms of precision.
Fig. 2 shows the experiment results in terms of recall. The results show that the proposed model has a recall of 71.5% which is higher than J48, Random Forest, Naive Bayes, and MLP neural network with recall of 66%, 65.2%, 50.9%, and 56.2%, respectively. Besides, the proposed hybrid model gives better results compared to hybrid algorithms of Decorate-J48 and Bagging-J48 with recalls of 69.1% and 67%.
Fig. 2. Comparing the proposed model with other algorithms in terms of precision.
Moreover, in terms of F-measure, as shown in Fig. 3, the proposed model has obtained an F-measure of 71.2% which outperforms J48, Random Forest, Naive Bayes, and MLP neural network with F-measures of 65.5%, 64.9%, 50.9%, and 56.2%, respectively. Furthermore, combining J48 with Decorate and Bagging algorithms has increased F-measure but not more than the hybrid proposed model.
Fig. 3. Comparing the proposed model with other algorithms in terms of F-measure.
Furthermore, the proposed model is compared with other models in terms of Kappa in Fig. 4. The experiment results show that the Kappa values obtained for J48, Decorate-J48, and Bagging-J48 are 0.50, 0.54, and 0.56, respectively, while its value for the proposed model is about 0.58, which indicates the superiority of the proposed model over the other models.
Fig. 4. Comparing the proposed model with other algorithms in terms of Kappa.
Furthermore, Fig. 5 compares the performance of the proposed model with other algorithms in terms of MAE. In terms of MAE, J48 outperforms other algorithms with a value of 0.1884. The MAE of the proposed model is 0.2427 which is a bit higher than J48. Indeed, this negligible shortcoming of the proposed model can be ignored compared to its superiority in terms of other measures.
Fig. 5. Comparing the proposed model with other algorithms in terms of mean absolute error.
In addition, the proposed model is compared with other algorithms in terms of RMSE. The results presented in Fig. 6 show that the proposed model with RMSE of 0.3265 has the minimum RMSE along with Decorate-J48 with RMSE of 0.3239 compared to other model indicating the desirable performance of the proposed model.
Fig. 6. Comparing the proposed model with other algorithms in terms of root mean squared error.
Finally, the proposed hybrid model is compared with other algorithms in terms of time required to build the model. The results of this experiment in Fig. 7 showed that the time required to build the proposed model is 3.42 s while this time is less for other algorithms. The reason is very clear because the proposed model is based on a combination of three different algorithms, and therefore, it takes longer to construct.
Fig. 7. Comparing the proposed model with other algorithms in terms of the time required to build the model.
In this paper, a hybrid prediction model based on data mining techniques is proposed to predict the number of times that opioid withdrawal applicants refer to welfare organizations. The statistical population of this study is comprised opioid withdrawal applicants referring to a welfare organization. The proposed model is a combination of Decorate, Bagging, and J48 algorithms, which benefits from the advantages of all three algorithms. The efficiency of the proposed hybrid model is evaluated in terms of precision, recall, F-measure, Kappa, RMSE, and MAE. The results show that the proposed model with a precision of 71.3% outperforms other algorithms such as Random Forest, Naïve Bayes, and MLP neural network.
. A. M. Trescot, S. Datta, M. Lee and H. Hansen. “Opioid pharmacology“. Pain Physician, vol. 11, no. 2 Suppl, pp. S133-S153, 2008.
. H. Jiawei and K. Micheline. “Data Mining:Concepts and Techniques“. 2nd ed. Morgan Kaufmann Publishers, Elsevier, Burlington, 2006.
. G. R. Murray and A. Scime. “Data Mining. Emerging Trends in the Social and Behavioral Sciences:An Interdisciplinary, Searchable, and Linkable Resource“. John Wiley and Sons, Hoboken, NJ, pp. 1-15, 2015.
. L. Zeng, L. Li, L. Duan, K. Lu, Z. Shi, M. Wang and P. Luo. “Distributed data mining:A survey“. Information Technology and Management, vol. 13, no. 4, pp. 403-409, 2012.
. J. Lu, S. Sridhar, R. Pandey, M. A. Hasan and G. Mohler. “Investigate Transitions into Drug Addiction through Text Mining of Reddit Data“. In:Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 2367-2375, 2019.
. Y. Fan, Y. Zhang, Y. Ye and W. Zheng. “Social Media for Opioid Addiction Epidemiology:Automatic Detection of Opioid Addicts from Twitter and Case Studies“. In:Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, ACM, pp. 1259-1267, 2017.
. Y. Zhang, Y. Fan, Y. Ye, X. Li and W. Zheng. “Detecting Opioid users from Twitter and Understanding their Perceptions toward Mat“. In:2017 IEEE International Conference on Data Mining Workshops, IEEE, pp. 502-509, 2017.
. Y. M. Kim. “Discovering major opioid-related research themes over time:A text mining technique“. AMIA Summits on Translational Science Proceedings, vol. 2019, pp. 751-760, 2019.
. S. Kaur and R. K. Bawa. “Future trends of data mining in predicting the various diseases in medical healthcare system“. International Journal of Energy, Information and Communications, vol. 6, no. 4, pp. 17-34, 2015.
. K. U. Rani. “Parallel approach for diagnosis of breast cancer using neural network technique“. International Journal of Computer Applications, vol. 10, no. 3, pp. 1-5, 2010.
. S. S. Shajahaan, S. Shanthi and V. M. Chitra. “Application of data mining techniques to model breast cancer data“. International Journal of Emerging Technology and Advanced Engineering, vol. 3, no. 11, pp. 362-369, 2013.
. S. Kaur and R. K. Bawa. “Implementation of an Expert System for the Identification of Drug Addiction using Decision Tree ID3 Algorithm“. In:2017 3rd International Conference on Advances in Computing, Communication and Automation, IEEE, pp. 1-6, 2017.
. X. Ji, S. A. Chun, J. Geller and V. Oria. “Collaborative and Trajectory Prediction Models of Medical Conditions by Mining Patients'Social Data“. In:2015 IEEE International Conference on Bioinformatics and Biomedicine, IEEE, pp. 695-700, 2015.
. A. K. M. Salleh, M. Makhtar, J. A. Jusoh, P. L. Lua and A. M. Mohamad. “A classification framework for drug relapse prediction“. Journal of Fundamental and Applied Sciences, vol. 9, no. 6S, pp. 735-750, 2017.
. M. C. Patel, M. Panchal and H. P. Bhavsar. “Decorate ensemble of artificial neural networks with high diversity for classification“. International Journal of Computer Science and Mobile Computing, vol. 2, no. 5, pp. 134-138, 2013.
. P. S. Adhvaryu and M. Panchal. “A review on diverse ensemble methods for classification“. IOSR Journal of Computer Engineering, vol. 1, no. 4, pp. 27-32, 2012.
. L. Breiman. “Bagging predictors“. Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
. H. Zhang, Y. Song, B. Jiang, B. Chen and G. Shan. “Two-stage bagging pruning for reducing the ensemble size and improving the classification performance“. Mathematical Problems in Engineering, vol. 2019, pp. 1-17, 2019.
. S. S. A. Poor and M. E. Shiri. “A genetic programming based algorithm for predicting exchanges in electronic trade using social networks'data“. International Journal of Advanced Computer Science and Applications, vol. 8, no. 5, pp. 189-196, 2017.