Big Data Sentimental Analysis Using Document to Vector and Optimized Support Vector Machine


  • Sozan Abdulla Mahmood Department of Computer Science, University of Sulaimani, Sulaymaniyah, Iraq
  • Qani Qabil Qasim Department of Computer Science, University of Sulaimani, Sulaymaniyah, Iraq



Document to Vector, Grey Wolf Optimizer, Particle Swarm Optimizer, Hybrid Particle Swarm Optimizer_Grey Wolf Optimizer, Opinion Mining, Radial Bias Function Kernel-based Support Vector Machine, Sentiment Analysis, Support Vector Machine Optimization, Twitter Application Programming Interface


With the rapid evolution of the internet, using social media networks such as Twitter, Facebook, and Tumblr, is becoming so common that they have made a great impact on every aspect of human life. Twitter is one of the most popular micro-blogging social media that allow people to share their emotions in short text about variety of topics such as company’s products, people, politics, and services. Analyzing sentiment could be possible as emotions and reviews on different topics are shared every second, which makes social media to become a useful source of information in different fields such as business, politics, applications, and services. Twitter Application Programming Interface (Twitter-API), which is an interface between developers and Twitter, allows them to search for tweets based on the desired keyword using some secret keys and tokens. In this work, Twitter-API used to download the most recent tweets about four keywords, namely, (Trump, Bitcoin, IoT, and Toyota) with a different number of tweets. “Vader” that is a lexicon rule-based method used to categorize downloaded tweets into “Positive” and “Negative” based on their polarity, then the tweets were protected in Mongo database for the next processes. After pre-processing, the hold-out technique was used to split each dataset to 80% as “training-set” and rest 20% “testing-set.” After that, a deep learning-based Document to Vector model was used for feature extraction. To perform the classification task, Radial Bias Function kernel-based support vector machine (SVM) has been used. The accuracy of (RBF-SVM) mainly depends on the value of hyperplane “Soft Margin” penalty “C” and γ “gamma” parameters. The main goal of this work is to select best values for those parameters in order to improve the accuracy of RBF-SVM classifier. The objective of this study is to show the impacts of using four meta-heuristic optimizer algorithms, namely, particle swarm optimizer (PSO), modified PSO (MPSO), grey wolf optimizer (GWO), and hybrid of PSO-GWO in improving SVM classification accuracy by selecting the best values for those parameters. To the best of our knowledge, hybrid PSO-GWO has never been used in SVM optimization. The results show that these optimizers have a significant impact on increasing SVM accuracy. The best accuracy of the model with traditional SVM was 87.885%. After optimization, the highest accuracy obtained with GWO is 91.053% while PSO, hybrid PSO-GWO, and MPSO best accuracies are 90.736%, 90.657%, and 90.557%, respectively.


[1] A. Go, R. Bhayani and L. Huang. “Twitter Sentiment Classification using Distant Supervision”. Technical Report, Stanford University. p. 6, 2009.
[2] R. Feldman. “Techniques and applications for sentiment analysis: The main applications and challenges of one of the hottest research areas in computer science”. Communication of the ACM, vol. 56, no. 4, pp. 82-89, 2013.
[3] N. Bindal and N. Chatterjee. “A two-step method for sentiment analysis of tweets.” In: 15th International Conference Information Technology 2016, Bhubaneswar, pp. 218-224, 2017.
[4] S. K. Jain and P. Singh. “Systematic Survey on Sentiment Analysis”. In: 2018-1st International Conference on Secure Cyber Computing and Communication, Jalandhar, pp. 561-565, 2019.
[5] M. K. Das, B. Padhy and B. K. Mishra. “Opinion mining and sentiment classification: A review”. In: Proceedings of the International Conference on Inventive Systems and Control 2017, Malaysia, pp. 4-6, 2017.
[6] S. Naz, A. Sharan and N. Malik. “Sentiment Classification on Twitter Data Using Support Vector Machine”. 2018 IEEE/WIC/ACM International Conference on Web Intelligence, Santiago, pp. 676-679, 2019.
[7] P. Seth, A. Sharma and R. Vidhya. “Sentiment analysis of tweets using machine learning approach”. International Journal of Engineering and Technology, vol. 7, no. 3.12, p. 434, 2018.
[8] A. K. Sharma. and D. S. U. Kumari. “Sentiment Analysis of Smart Phone Product Review using SVM Classification Technique”. International Conference on Energy, Communication, Data Analytics and Soft Computing, Chennai, India, pp.1469-1474, 2017.
[9] V. S. Rajput and S. M. Dubey. “Stock market sentiment analysis based on machine learning”. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), Dehradun, pp. 506-510, 2017.
[10] A. Rane and A. Kumar. “Sentiment classification system of twitter data for us airline service analysis”. International Computer Software and Applications Conference, vol. 1, pp. 769-773, 2018.
[11] Q. Shuai, Y. Huang, L. Jin and L. Pang. “Sentiment Analysis on Chinese Hotel Reviews with Doc2Vec and Classifiers”. In: 018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp. 1171-1174, 2018.
[12] A. Mukwazvure and K. P. Supreethi. “A Hybrid Approach to Sentiment Analysis of News Comments”. In: 2015 4th International Conference on Reliability, Infocom Technologies and Optimization, Noida, 2015.
[13] A. C. Flores, R. I. Icoy, C. F. Pena and K. D. Gorro. “An Evaluation of SVM and Naive Bayes with SMOTE on Sentiment Analysis Data Set”. In: 2018-4th International Conference on Engineering, Applied Sciences, and Technology, Explor Innovative Smart Solutions Social, Phuket, pp. 1-4, 2018.
[14] J. Hutto and E. E. Gilbert. “VADER: A Parsimonious Rulebased Model for Sentiment Analysis of Social Media Text”. In: 8th International Conference on Weblogs and Social Media, Michigan, 2014.
[15] Q. Le and T. Mikolov. “Distributed Representations of Sentences and Documents”. 31st International Conference on Machine Learning, vol. 4, pp. 2931-2939, 2014.
[16] M. Bilgin and İ. F. Şentürk. “Sentiment Analysis on Twitter Data with Semi-supervised Doc2Vec”. In: 2nd International Conference on Computer Science and Engineering UBMK 2017, Turkish, pp. 661-666, 2017.
[17] R. Eberhart and J. Kennedy. “New Optimizer Using Particle Swarm Theory”. In: Proceedings International Symposium on Micro Machine and Human Science, New York, pp. 39-43, 1995.
[18] S. Mirjalili, S. M. Mirjalili and A. Lewis. “Grey wolf optimizer”. Advances Engineering Software, vol. 69, pp. 46-61, 2014.
[19] N. Singh and S. B. Singh. “Hybrid algorithm of particle swarm optimization and grey wolf optimizer for improving convergence performance”. Journal of Applied Mathematics, vol. 2017, pp. 15, 2017.