Comparison of Different Ensemble Methods in Credit Card Default Prediction

Authors

  • Azhi Abdalmohammed Faraj 1Department of Information Technology, College of Commerce, University of Sulaimani, Sulaimani, Iraq, 2Department of Computer Engineering, College of Engineering, Dokuz Eylül Üniversitesi, İzmir, Turkey
  • Didam Ahmed Mahmud Department of Information Technology, College of Commerce, University of Sulaimani, Sulaimani, Iraq
  • Bilal Najmaddin Rashid Department of Information Technology, College of Commerce, University of Sulaimani, Sulaimani, Iraq

DOI:

https://doi.org/10.21928/uhdjst.v5n2y2021.pp20-25

Keywords:

Ensemble methods, Credit card default prediction, Balanced and imbalanced dataset, Stacking and XGBoosting, Neural networks

Abstract

Credit card defaults pause a business-critical threat in banking systems thus prompt detection of defaulters is a crucial and challenging research problem. Machine learning algorithms must deal with a heavily skewed dataset since the ratio of defaulters to non-defaulters is very small. The purpose of this research is to apply different ensemble methods and compare their performance in detecting the probability of defaults customer’s credit card default payments in Taiwan from the UCI Machine learning repository. This is done on both the original skewed dataset and then on balanced dataset several studies have showed the superiority of neural networks as compared to traditional machine learning algorithms, the results of our study show that ensemble methods consistently outperform Neural Networks and other machine learning algorithms in terms of F1 score and area under receiver operating characteristic curve regardless of balancing the dataset or ignoring the imbalance

Author Biographies

Azhi Abdalmohammed Faraj, 1Department of Information Technology, College of Commerce, University of Sulaimani, Sulaimani, Iraq, 2Department of Computer Engineering, College of Engineering, Dokuz Eylül Üniversitesi, İzmir, Turkey

Computer Engineering Department        

Didam Ahmed Mahmud, Department of Information Technology, College of Commerce, University of Sulaimani, Sulaimani, Iraq

Information Technology Department

Bilal Najmaddin Rashid, Department of Information Technology, College of Commerce, University of Sulaimani, Sulaimani, Iraq

Information Technology Department

References

[1] I. Cheng Yeh and C. H. Lien. “The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients”. Expert Systems with Applications, vol. 36, no. 2, pp. 2473-2480, 2009.
[2] D. J. Hand and W. E. Henley. “Statistical classification methods in consumer credit scoring: A review”. Journal of the Royal Statistical Society, vol. 160, no. 3, pp. 523-541, 1997.
[3] Y. Li and W. Chen. “A comparative performance assessment of ensemble learning for credit scoring”. Mathematics, vol. 8, no. 10, p, 1756, 2020.
[4] M. Akour, I. Alsmadi and I. Alazzam. “Software fault proneness prediction: A comparative study between bagging, boosting, and stacking ensemble and base learner methods”. International Journal of Data Analysis Techniques and Strategies, Vol. 9, No. 1, pp. 1-16, 2017.
[5] G. Williams and Z. Huang. “Mining the knowledge mine: The hot spots methodology for mining large real world databases”. In: Proceedings of the 10th Australian Joint Conference on Artificial Intelligence, Perth, Australia, 1997.
[6] R. Saia, S. Carta and G. Fenu. “A wavelet-based data analysis to credit scoring”. In: ICDSP 2018: Proceedings of the 2nd International Conference on Digital Signal Processing, ACM, 2018, pp. 176- 180, 2018.
[7] R. Saia and S. Carta. “A fourier spectral pattern analysis to design credit scoring models”. In: Proceedings of the 1st International Conference on Internet of Things and Machine Learning, ACM, p. 18, 2017.
[8] V. Ceronmani Sharmila, K. K. R., S. R., S. D. and H. R. “Credit card fraud detection using anomaly techniques”. In: 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT), pp. 1-6, 2019.
[9] X. Zhang, Y. Yang and Z. Zhou. “A novel credit scoring model based on optimized random forest”. In: 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), pp. 60- 65, 2018.
[10] B. Zhu, W. Yang, H. Wang and Y. Yuan. “A hybrid deep learning model for consumer credit scoring”. In: 2018 International Conference on Artificial Intelligence and Big Data ( ICAIBD), pp. 205-208, 2018.
[11] V. Neagoe, A. Ciotec and G. Cucu. “Deep convolutional neural networks versus multilayer perceptron for financial prediction”. In: 2018 International Conference on Communications (COMM), pp. 201-206, 2018.
[12] I. Sohony, R. Pratap and U. Nambiar. “Ensemble learning for credit card fraud detection”. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 2018.
[13] J. Lpez and S. Maldonado. “Profit-based credit scoring based on robust optimization and feature selection”. Information Sciences, vol. 500, pp. 190-202, 2019.
[14] G. Wang, J. Hao, J. Ma and H. Jiang. “A comparative assessment of ensemble learning for credit scoring”. Expert Systems with Applications, vol. 38, no. 1, pp. 223-230, 2011.
[15] A. Ghodselahi. “A hybrid support vector machine ensemble model for credit scoring”. International Journal of Computer Applications, vol. 17, no. 5, pp. 1-5, 2011.
[16] H. Zhang, H. He and W. Zhang. “Classifier selection and clustering with fuzzy assignment in ensemble model for credit scoring”. Neurocomputing, vol. 316, pp. 210-221, 2018.
[17] X. Feng, Z. Xiao, B. Zhong, J. Qiu and Y. Dong. “Dynamic ensemble classification for credit scoring using soft probability”. Applied Soft Computing, vol. 65, pp. 139-151, 2018.
[18] D. Tripathi, D. R. Edla, V. Kuppili, A. Bablani and R. Dharavath. “Credit scoring model based on weighted voting and cluster based feature selection”. Procedia Computer Science, vol. 132, pp. 22- 31, 2018.
[19] P. Bühlmann. “Bagging, boosting and ensemble methods”. In: J. Gentle, W. Härdle and Y. Mori, (eds.), Handbook of Computational Statistics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg, 2012.
[20] G. Kunapuli. “Ensemble Methods for Machine Learning”. MEAP Publication, Shelter Island, New Work, 2020.
[21] S. Hamori, M. Kawai, T. Kume, Y. Murakami and C. Watanabe. “Ensemble learning or deep learning? Application to default risk analysis”. Journal of Risk and Financial Management, vol. 11, p. 12, 2018.
[22] R. E. Schapire and Y. Freund. “Boosting: Foundations and algorithms”. Kybernetes, vol. 42, no. 1, pp. 164-166, 2013.
[23] B. Niu, J. Ren and X. Li. “Credit scoring using machine learning by combing social network information: Evidence from peer-to-peer lending”. Information, vol. 10, p. 397, 2019.
[24] A. Mayr, H. Binder, O. Gefeller and M. Schmid. “The evolution of boosting algorithms. From machine learning to statistical modeling”. Methods of Information in Medicine, vol. 53, no. 6, pp. 419-427, 2014.
[25] R. Sikora and O. H. Al-laymoun. “A modified stacking ensemble machine learning algorithm using genetic algorithms”. Journal of International Technology and Information Management, vol. 23, p. 1, 2014.

Published

2021-07-19

Issue

Section

Articles