Log File Analysis Based on Machine Learning: A Survey



  • Rawand Raouf Abdalla -Department of Information Technology, Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani, Kurdistan Region, Iraq
  • Alaa Khalil Jumaa Department of Information Technology, Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani, Kurdistan Region, Iraq




Log Files, Log Analysis, Machine Learning, Anomaly Detection, User Behavior, Log File Maintenance


In the past few years, software monitoring and log analysis become very interesting topics because it supports developers during software developing, identify problems with software systems and solving some of security issues. A log file is a computer-generated data file which provides information on use patterns, activities, and processes occurring within an operating system, application, server, or other devices. The traditional manual log inspection and analysis became impractical and almost impossible due logs’ nature as unstructured, to address this challenge, Machine Learning (ML) is regarded as a reliable solution to analyze log files automatically. This survey tries to explore the existing ML approaches and techniques which are utilized in analyzing log file types. It retrieves and presents the existing relevant studies from different scholar databases, then delivers a detailed comparison among them. It also thoroughly reviews utilized ML techniques in inspecting log files and defines the existing challenges and obstacles for this domain that requires further improvements.


E. Shirzad and H. Saadatfar. “Job failure prediction in hadoop based on log file analysis”. International Journal of Computers and Applications, vol. 44, no. 3, pp. 260-269, 2022.

A. U. Memon, J. R. Cordy and T. Dean. “Log File Categorization and Anomaly Analysis Using Grammar Inference”. Queen’s University, Canada, 2008.

M. Siwach and S. Mann. “Anomaly detection for web log data analysis: A review”. Journal of Algebraic Statistics, vol. 13, no. 1, pp. 129-148, 2022.

H. S. Malallah, S. R. Zeebaree, R. R. Zebari, M. A. Sadeeq, Z. S. Ageed, I. M. Ibrahim, H. M. Yasin and K. J. Merceedi. “A comprehensive study of kernel (issues and concepts) in different operating systems”. Asian Journal of Research in Computer Science, vol. 8, no. 3, pp.16-31, 2021.

I. Mavridis, I and H. Karatza. “Performance evaluation of cloud-based log file analysis with apache hadoop and apache spark”. Journal of Systems and Software, vol. 125, pp. 133-151, 2017.

T. Yang and V. Agrawal. “Log file anomaly detection”. CS224d Fall, vol. 2016, pp. 1-7, 2016.

S. Khan, A. Gani, A. W. A. Wahab, M. A. Bagiwa, M. Shiraz, S. U. Khan, R. Buyya and R. Y. Zomaya. “Cloud log forensics: Foundations, state of the art, and future directions”. ACM Computing Surveys (CSUR), vol. 49, no. 1, pp. 1-42, 2016.

V. Chitraa and A. S. Davamani. “A survey on preprocessing methods for web usage data”. International Journal of Computer Science and Information Security, Vol. 7, no. 3, p. 1257. 2010.

R. A. Bridges, T. R. Glass-Vanderlan, M. D. Iannacone, M. S. Vincent and Q. Chen. “A survey of intrusion detection systems leveraging host data”. ACM Computing Surveys (CSUR), vol. 52, no. 6, pp. 1-35, 2019.

H. Studiawan, F. Sohel and C. Payne. “A survey on forensic investigation of operating system logs”. Digital Investigation, vol. 29, pp. 1-20, 2019.

Available from: https://www.humio.com/glossary/log-file [Last accessed on 2022 Sep 01].

S. He, P. He, Z. Chen, T. Yang, Y. Su and M. R. Lyu. “A survey on automated log analysis for reliability engineering”. ACM Computing Surveys (CSUR), vol. 54, no. 6, pp. 1-37, 2020.

M. Kumar, M. Meenu. “Analysis of visitor’s behavior from web log using web log expert tool”. In 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA). vol. 2, Institute of Electrical and Electronics Engineers, Manhattan, New York, pp. 296-301, 2017.

W. Li. “Automatic Log Analysis Using Machine Learning: Awesome Automatic Log Analysis Version 2.0”. 2013.

N. Singh, A. Jain and R. S. Raw. “Comparison analysis of web usage mining using pattern recognition techniques”. International Journal of Data Mining and Knowledge Management Process, Vol. 3, no. 4, p. 137, 2013.

M. A. Latib, S. A. Ismail, O. M. Yusop, P. Magalingam and A. Azmi. “Analysing log files for web intrusion investigation using hadoop”. In: Proceedings of the 7 th International Conference on Software and Information Engineering, pp. 12-21, 2018.

J. Qiu, Q. Wu, G. Ding, Y. Xu and S. Feng. “A survey of machine learning for big data processing”. EURASIP Journal on Advances in Signal Processing, Vol. 2016, no. 1, pp. 1-16, 2016.

N. Jones. “Computer science: The learning machines”. Nature, vol. 505, no. 7482, pp. 146-148, 2014.

M. A. Latib, S. A. Ismail, H. M. Sarkan and R. C. Yusoff. “Analyzıng log ın bıg data envıronment: A revıew”. ARPN Journal of Engineering and Applied Sciences, vol. 10, no. 23, pp. 17777- 17784, 2015.

H. Xiang. “Research on clustering algorithm based on web log mining”. Journal of Physics Conf Series, vol. 1607, no. 1, p. 012102, 2020.

J. Xu, F. Xu, F. Ma, L. Zhou, S. Jiang and Z. Rao. “Mining web usage profiles from proxy logs: User identification”. In: 2021 IEEE Conference on Dependable and Secure Computing (DSC). Institute of Electrical and Electronics Engineers, Manhattan, New York, pp. pp. 1-6, 2021.

J. Kim, M. Park, H. Kim, S. Cho and P. Kang. “Insider threat detection based on user behavior modeling and anomaly detection algorithms”. Applied Sciences, vol. 9, no. 19, p. 4018, 2019.

P. G. Prakash and A. Jaya. “Analyzing and predicting user navigation pattern from weblogs using modified classification algorithm”. Indonesian Journal of Electrical Engineering and Computer, vol. 11, no. 1, pp.333-340, 2018.

A. Abbas, M. A. Khan, S. Latif, M. Ajaz, A. A. Shah and J. Ahmad. “A new ensemble-based intrusion detection system for internet of things”. Arabian Journal for Science and Engineering, vol. 47, no. 2, pp. 1805-1819, 2022.

V. Zeufack, D. Kim, D. Seo and A. Lee. “An unsupervised anomaly detection framework for detecting anomalies in real time through network system’s log files analysis”. High Confidence Computing, vol. 1, no. 2, pp. 100030, 2021.

Y. Li, S. Yao, R. Zhang and C. Yang. “Analyzing host security using D-S evidence theory and multisource information fusion”. International Journal of Intelligent Systems, vol. 36, no. 2, pp. 1053- 1068, 2021.

D. C. Le, A. N, Zincir-Heywood and M. I. Heywood. “Analyzing data granularity levels for insider threat detection using machine learning”. IEEE Transactions on Network and Service Management, vol. 17, no. 1, pp. 30-44, 2020.

N. Shah and A. Shankarappa. “Intelligent risk management framework for BYOD”. In: 2018 IEEE 15th International Conference on e-Business Engineering (ICEBE). Institute of Electrical and Electronics Engineers, Manhattan, New York, pp. 289-293, 2018.

S. G Tadesse and D. E Dedefa. “Layer based log analysis for enhancing security of enterprise datacenter”. International Journal of Computer Science and Information Security, vol. 14, no. 7, pp.158, 2016.

J Chen, P Wang, S Du and W Wang. “Log pattern mining for distributed system maintenance”. Complexity, vol. 2020, no. 2, pp. 1-12, 2020.

X. Cheng and R. Wang. “Communication network anomaly detection based on log file analysis”. In: International Conference on Rough Sets and Knowledge Technology. Springer, Cham, pp. 240-248, 2014.