Missing value imputation Techniques: A Survey

Authors

  • Wafaa Mustafa Hameed Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani, 46001, Kurdistan Region, Iraq. Department of Computer Science, Cihan University Sulaimaniya, Sulaimaniya, 46001, Kurdistan Region, Iraq
  • Nzar A. Ali Department of Computer Science, Cihan University Sulaimaniya, Sulaimaniya, 46001, Kurdistan Region, Iraq. Department of Statistics and informatics, University of Sulaimani, Sulaimani, 46001, Kurdistan Region, Iraq

DOI:

https://doi.org/10.21928/uhdjst.v7n1y2023.pp72-81

Keywords:

Data Preprocessing, Imputation, Mean, Categorical Data, Numerical Data

Abstract

Numerous of information is being accumulated and placed away every day. Big quantity of misplaced areas in a dataset might be a large problem confronted through analysts due to the fact it could cause numerous issues in quantitative investigates. To handle such misplaced values, numerous methods were proposed. This paper offers a review on different techniques available for imputation of unknown information, such as median imputation, hot (cold) deck imputation, regression imputation, expectation maximization, help vector device imputation, multivariate imputation using chained equation, SICE method, reinforcement programming, non-parametric iterative imputation algorithms, and multilayer perceptrons. This paper also explores a few satisfactory choices of methods to estimate missing values to be used by different researchers on this discipline of study. Furthermore, it aims to assist them to discern out what approach is commonly used now, the overview may additionally provide a view of every technique alongside its blessings and limitations to take into consideration of future studies on this area of study. It can be taking into account as baseline to solutions the question which techniques were used and that is the maximum popular.

References

B. Doshi. Handling Missing Values in Data Mining. Rochester Institute of Technology, Rochester, New York, U S A, 2010. Available from: https://www.pdfs.semanticscholar.org/3817/ b208fe1f40891cc661ea0db80c8fccc56b70.pdf [Last accessed on 2023 Mar 27].

S. Gupta and M. K. Gupta. “A survey on different techniques for handling missing values in dataset”. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, vol. 4, no. 1, pp. 2456-3307, 2018.

A. Jadhav, D. Pramod and K. Ramanathan. “Comparison of performance of data imputation methods for numeric dataset”. Applied Artificial Intelligence, vol. 33, no. 10, pp. 913-933, 2019.

J. Scheffer. “Dealing with missing data”. Research Letters in the Information and Mathematical Sciences, vol. 3, pp. 153-160, 2002.

D. V. Patil. “Multiple imputation of missing data with genetic algorithm based techniques”. IJCA Special Issue on Evolutionary Computation, vol. 2, pp. 74-78, 2010.

S. I. Khan and A. S. Hoque. “SICE: An improved missing data imputation technique.” Journal of Big Data, vol. 7, no. 1, p. 37, 2020.

S. Singh and J. Prasad. “Estimation of missing values in the data mining and comparison of imputation methods.” Mathematical Journal of Interdisciplinary Sciences, vol. 1, no. 2, pp. 75-90, 2013.

I. Pratama, A. E. Permanasari, I. Ardiyanto and R. Indrayani. A Review of Missing Values Handling Methods on Time Series Data, in: International Conference on Information Technology Systems and Innovation (ICITSI). Bandung, Bali, IEEE, 2016, p. 6.

S. Wang and H. Wang. Mining Data Quality in Completeness. University of Massachusetts Dartmouth, United States of America, 2007. Available from: https://www.pdfs.semanticscholar.org/347c/ f73908217751c8d5c617ae964fdcb87674c3.pdf [Last accessed on 2023 Mar 27].

R. L. Vaishnav and K. M. Patel. “Analysis of various techniques to handling missing value in dataset”. International Journal of Innovative and Emerging Research in Engineering, vol. 2, no. 2, pp. 191-195, 2015.

A. Raghunath. Survey Sampling Theory and Applications. Academic Press, Cambridge, 2017.

Holman and C. A. Glas. “Modelling non-ignorable missing-data mechanisms with item response theory models”. British Journal of Mathematical and Statistical Psychology, vol. 58, no. 1, pp. 1-17, 2005.

A. Puri and M. Gupta. “Review on missing value imputation techniques in data mining. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, vol. 2, no. 7, pp. 35-40, 2017.

S. Van Buuren and K. Groothuis-Oudshoorn. “MICE: Multivariate imputation by chained equations in R”. Journal of Statistical Software, vol. 45, no. 3, pp. 1-67, 2010.

A. S. Kumar and G. V. Akrishna. “Internet of things based clinical decision support system using data mining techniques”. Journal of Advanced Research in Dynamical and Control Systems, vol. 10, no. 4, pp. 132-139, 2018.

J. W. Grzymala-Busse, L. K. Goodwin, W. J. Grzymala-Busse and X. Zheng. Handling Missing Attribute Values in Preterm Birth Data Sets. Vol. 3642. United Nations Academic Impact, New York, 2005, pp. 342-351.

J. Han, M. Kamber and J. Pei. Data Mining: Concepts and Techniques. 3rd ed. Morgan Kaufmann Publishers, San Francisco, CA, USA, 2012.

G. Chhabra, V. Vashisht and J. Ranjan. “A comparison of multiple imputation methods for data with missing values”. Indian Journal of Science and Technology, vol. 10, no. 19, pp. 1-7, 2017.

S. E. Awan, M. Bennamoun, F. Sohel, F. Sanfilippo and G. Dwivedi. “A reinforcement learning-based approach for imputing missing data”. Neural Computing and Applications, vol. 34, pp. 9701-9716, 2022.

I. E. W. Rachmawan and A. R. Barakbah. Optimization of Missing Value Imputation using Reinforcement Programming, in:

International Electronics Symposium (IES). Institute of Electrical and Electronics Engineers, Piscataway, New Jersey, 2015, pp. 128-133.

W. M. Hameed and N. A. Ali. “Enhancing imputation techniques performance utilizing uncertainty aware predictors and adversarial learning”. Periodicals of Engineering and Natural Sciences, vol. 10, no. 3, pp. 350-367, 2022.

T. Aljuaid and S. Sasi. Intelligent Imputation Technique for Missing Values, in: Conference on Advances in Computing, Communications and Informatics (ICACCI). Jaipur, India, pp. 2441- 2445, 2016.

P. Schmitt, J. Mandel and M. Guedj. “A comparison of six methods for missing data imputation”. Journal of Biometrics and Biostatistics, vol. 6, no. 1, pp. 1, 2015.

Published

2023-03-28

How to Cite

Hameed, W. M., & Ali, N. A. (2023). Missing value imputation Techniques: A Survey. UHD Journal of Science and Technology, 7(1), 72–81. https://doi.org/10.21928/uhdjst.v7n1y2023.pp72-81

Issue

Section

Articles