Malicious URL Detection Using Decision Tree-based Lexical Features Selection and Multilayer Perceptron Model

Authors

  • Warmn Faiq Ahmed Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani 46001, Kurdistan Region, Iraq
  • Noor Ghazi M. Jameel Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani 46001, Kurdistan Region, Iraq

DOI:

https://doi.org/10.21928/uhdjst.v6n2y2022.pp105-116

Keywords:

Multilayer Perceptron, Lexical Feature, Feature Selection, Malicious URL, Synthetic Minority Oversampling Technique

Abstract

Network information security risks multiply and become more dangerous. Hackers today generally target end-to-end technology and take advantage of human weaknesses. Furthermore, hackers take advantage of technology weaknesses by applying various methods to attack. Nowadays, one of the greatest dangers to the modern digital world is malicious URLs, and stopping them is one of the biggest challenges in the field of cyber security. Detecting harmful URLs using machine learning and deep learning algorithms have been the subject of various academic papers. However, time and accuracy are the two biggest challenges of these tools. This paper proposes a multilayer perceptron (MLP) model that utilizes two significant aspects to make it more practical, lightweight, and fast: Using only lexical features and a decision tree (DT) algorithm to select the best relevant subset of features. The effectiveness of the experimental outcomes is evaluated in terms of time, accuracy, and error reduction. The results show that a MLP model using 35 features could achieve an accuracy of 94.51% utilizing only URL lexical features. Furthermore, the model is improved in time after applying the DT as feature selection with a slight improvement in accuracy and loss.

References

J. Yuan, G. Chen, S. Tian and X. Pei. “Malicious URL detection based on a parallel neural joint model,” IEEE Access, vol. 9, pp. 9464-9472, 2021.

R. Yang, K. Zheng, B. Wu, C. Wu and X. Wang. “Phishing website detection based on deep convolutional neural network and random forest ensemble learning,” Sensors, vol. 21, no. 24, pp, 8281, 2021.

S. Cook. “Malware Statistics in 2022: Frequency, Impact, Cost and More,” 2022. Available from: https://www.comparitech.com/ antivirus/malware-statistics-facts [Last accessed on 2022 Aug 18].

S. Kumi, C. Lim and S. G. Lee. “Malicious url detection based on associative classification.” Entropy, vol. 23, no. 2, pp. 1-12, 2021.

W. Bo, Z. B. Fang, L. X. Wei, Z. F. Cheng and Z. X. Hua. “Malicious URLs detection based on a novel optimization algorithm.” IEICE Transactions on Information and Systems, vol. E104.D, no. 4, pp. 513-516, 2021.

Z. Chen, Y. Liu, C. Chen, M. Lu and X. Zhang. “Malicious URL detection based on improved multilayer recurrent convolutional neural network model.” Security and Communication Networks, vol. 2021, pp. 9994127, 2021.

S. M. Nair. “Detecting malicious URL using machine learning: A Survey.” International Journal for Research in Applied Science and Engineering Technology, vol. 8, no. 5, pp. 2670-2677, 2020.

C. Do Xuan, H. Dinh Nguyen and T. Victor Nikolaevich. “Malicious URL Detection Based on Machine Learning.” International Journal of Advanced Computer Science and Applications, vol. 11, pp. 148-153, 2020.

V. Subha, M. S. Pretha and R. Manimegalai. “ Malicious Url Classification Using Data Mining Techniques.” Journal of Analysis and Computation (JAC), pp. 148-153, 2018.

M. Maminur Islam, S. Poudyal and K. Datta Gupta. “Map reduce implementation for malicious websites classification.” International Journal of Network Security and its Applications, vol. 11, no. 5, pp. 27-35, 2019.

D. Liu and J. H. Lee. “Cnn based malicious website detection by invalidating multiple web spams.” IEEE Access, vol. 8, pp. 97258-97266, 2020.

P. Balamurugan, T. Amudha, J. Satheeshkumar and M. Somam. “Optimizing neural network parameters for effective classification of benign and malicious websites.” Journal of Physics Conference Series, vol. 1998, no. 1, 2021.

Y. Chen, Y. Zhou, Q. Dong and Q. Li. “A Malicious URL detection method based on CNN.” In: 2020 IEEE Conference on Telecommunications, Optics and Computer Science, TOCS 2020. IEEE, Piscataway, 2020, pp. 23-28.

N. Khan, R. Naresh, A. Gupta and S. Giri. “Ayon gupta and sanghamitra Giri, malicious URL detection system using combined SVM and logistic regression model.” International Journal of Advanced Research in Science, Engineering and Technology, vol. 11, no. 4, pp. 63-73, 2020.

A. Das, A. Das, A. Datta, S. Si and S. Barman. “Deep approaches on malicious URL classification.” In: 2020 11th International Conference on Computer Networks and Communication Technologies. ICCCNT 2020, IEEE, Piscataway, 2020.

Y. Peng, S. Tian, L. Yu, Y. Lv and R. Wang. “Malicious URL recognition and detection using attention-based CNN-LSTM.” KSII Transactions on Internet and Information Systems, vol. 13, no. 11, pp. 5580-5593, 2019.

Adamyong. “GitHub-Adamyong-zbf/URL_Detection: Data Set.” 2020. Available from: https://github.com/adamyong-zbf/URL_ detection [Last accessed on 2022 Aug 18].

L. M. Camarinha-Matos, N. Farhadi, F. Lopes and H. Pereira, Editors., Technological Innovation for Life Improvement, Vol. 577. Springer International Publishing, Cham, 2020.

S. Singhal, U. Chawla and R. Shorey. “Machine learning concept drift based approach for malicious website detection.” In: 2020 International Conference on Communication Systems Networks, COMSNETS 2020, IEEE, Piscataway, pp. 582-585, 2020.

Maheshwari S, B. Janet and R. J. A. Kumar. “Malicious URL Detection: A Comparative Study.” In: Proceedings International Conference on Artificial Intelligence and Smart Systems, ICAIS 2021. IEEE, Piscataway, pp. 1147-1151, 2021.

Y. Peng, S. Tian, L. Yu, Y. Lv and R. Wang. “A Joint Approach to Detect Malicious URL Based on Attention Mechanism.” International Journal of Computational Intelligence and Applications, vol. 18, no. 3, 2019.

A. S. Raja, R. Vinodini and A. Kavitha. “Lexical features based malicious URL detection using machine learning techniques.” Materials Today Proceedings, vol. 47, pp. 163-166, 2021.

S. D. Vara Prasad and K. R. Rao. “A Novel Framework For Malicious URL Detection Using Hybrid Model.” Turkish Journal of Computer and Mathematics Education, vol. 12, pp. 2542, 2021.

S. Ahmad and A. Tamimi, “Detecting Malicious Websites Using Machine Learning,” M.S. thesis, Department of Graduate Programs & Research, Rochester Institute of Technology, RIT Dubai, April. 2020. [Online]. Available from: https://scholarworks.rit.edu/theses

T. Manyumwa, P. F. Chapita, H. Wu and S. Ji. “Towards Fighting Cybercrime: Malicious URL Attack Type Detection using Multiclass Classification.” In: Proceedings 2020 IEEE International Conference on Big Data, Big Data 2020, IEEE, Piscataway, pp. 1813-1822, 2020.

F. Alkhudair, M. Alassaf, R. Ullah Khan and S. Alfarraj. “Detecting Malicious URL.” IEEE, Piscataway, 2020.

R. R. Rout, G. Lingam and D. V. L. Somayajulu. “Detection of malicious social bots using learning automata with url features in twitter network.” IEEE Transactions on Computational Social Systems, vol. 7, no. 4, pp. 1004-1018, 2020.

Y. C. Chen, Y. W. Ma and J. L. Chen. “Intelligent malicious url detection with feature analysis.” In: Proceedings Second IEEE Symposium on Computer and Communications. Vol. 2020. IEEE, Piscataway, 2020.

S. He, J. Xin, H. Peng and E. Zhang. “Research on malicious URL detection based on feature contribution tendency.” In: 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2021, pp. 576-581, 2021.

T. Li, G. Kou and Y. Peng. “Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods.” Information Systems, vol. 91, pp. 101494, 2020

R. Ikwu. In: R. E. Ikwu, editor. “Extracting Feature Vectors From URL Strings For Malicious URL Detection.” Towards Data Science,” Canada, 2021. Available from: https://towardsdatascience.com/ extracting-feature-vectors-from-url-strings-for-malicious-url-detection-cbafc24737a [Last accessed on 2022 Aug 16].

G. S. Kori and D. M. S. Kakkasageri. “Classification and Regression Tree (Cart) Based Resource Allocation Scheme for Wireless Sensor Networks.” Social Science Research Network, Rochester, NY, 2022.

N. Hosseini, F. Fakhar, B. Kiani and S. Eslami. “Enhancing the security of patients’ portals and websites by detecting malicious web crawlers using machine learning techniques.” International Journal of Medical Informatics, vol. 132, pp. 103976, 2019.

M. Chatterjee and A. S. Namin. “Deep Reinforcement Learning for Detecting Malicious Websites.” Computer Science, vol. 15. pp. 55, 2019.

Published

2022-11-13

Issue

Section

Articles