Sentiment Analysis Using Hybrid Feature Selection Techniques
Keywords:Binary Coordinate Ascent, Bag of Words, Chi-Square, Logistic Regression, n-grams, Opinion Mining, Sentiment Analysis, Support Vector Machine, Twitter-Application Programming Interface, Term Frequency-Inverse Document Frequency
Nowadays, people from every part of the world use social media and social networks to express their feelings toward different topics and aspects. One of the trendiest social media is Twitter, which is a microblogging website that provides a platform for its users to share their views and feelings about products, services, events, etc., in public. Which makes Twitter one of the most valuable sources for collecting and analyzing data by researchers and developers to reveal people sentiment about different topics and services, such as products of commercial companies, services, well-known people such as politicians and athletes, through classifying those sentiments into positive and negative. Classification of people sentiment could be automated through using machine learning algorithms and could be enhanced through using appropriate feature selection methods. We collected most recent tweets about (Amazon, Trump, Chelsea FC, CR7) using Twitter-Application Programming Interface and assigned sentiment score using lexicon rule-based approach, then proposed a machine learning model to improve classification accuracy through using hybrid feature selection method, namely, filter-based feature selection method Chi-square (Chi-2) plus wrapper-based binary coordinate ascent (Chi-2 + BCA) to select optimal subset of features from term frequency-inverse document frequency (TF-IDF) generated features for classification through support vector machine (SVM), and Bag of words generated features for logistic regression (LR) classifiers using different n-gram ranges. After comparing the hybrid (Chi-2+BCA) method with (Chi-2) selected features, and also with the classifiers without feature subset selection, results show that the hybrid feature selection method increases classification accuracy in all cases. The maximum attained accuracy with LR is 86.55% using (1 + 2 + 3-g) range, with SVM is 85.575% using the unigram range, both in the CR7 dataset.
 A. S. Al Shammari. “Real-time Twitter Sentiment Analysis using 3-way classifier”. 21st Saudi Computer Society National Computer Conference’s, pp. 1-3, 2018.
 R. D. Desai. “Sentiment Analysis of Twitter Data”. Proceeding 2nd International Conference Intelligence Computing Control System no. Iciccs, pp. 114-117, 2019.
 P. M. Mathapati, A. S. Shahapurkar and K. D. Hanabaratti. “Sentiment Analysis using Naïve Bayes Algorithm”. International Journal of Computational Science and Engineering, vol. 5, no. 7, pp. 75-77, 2017.
 N. Krishnaveni and V. Radha. “Feature Selection Algorithms for Data Mining Classification: A Survey”. Indian Journal of Science and Technology, vol. 12, no. 6, pp. 1-11, 2019.
 Y. Zhai, W. Song, X. Liu, L. Liu and X. Zhao. “A Chi-square Statistics Based Feature Selection”. 2018 IEEE 9th Internatinal Conference Software Engineering Services Science, pp. 160-163, 2018.40
 I. Kurniawati and H. F. Pardede. “Hybrid Method of Information Gain and Particle Swarm Optimization for Selection of Features of SVM-Based Sentiment Analysis”. 2018 Internatinal Conference Information Technology System innovation, pp. 1-5, 2019.
 S. Kaur, G. Sikka and L. K. Awasthi. “Sentiment Analysis Approach Based on N-gram and KNN Classifier”. ICSCCC 2018 1st International Conference Security Cyber Computer communication, pp. 13-16, 2019.
 X. Zhang and X. Zheng. “Comparison of Text Sentiment Analysis Based on Machine Learning”. Proceeding 15th Internatioanl Symposium Parallel Distributed Computing ISPDC 2016, pp. 230- 233, 2017.
 R. Joshi and R. Tekchandani. “Comparative Analysis of Twitter Data Using Supervised Classifiers”. Proceeding International Conference Invention Computer Technology ICICT 2016, vol. 2016, 2016.
 M. Luo and L. Luo. “Feature Selection for Text Classification Using OR+SVM-RFE”. 2010 Chinese Control Decision Conference CCDC 2010, pp. 1648-1652, 2010.
 R. Maipradit, H. Hata and K. Matsumoto. “Sentiment classification using N-gram IDF and automated machine learning”. IEEE Software, vol. 7459, pp. 10-13, 2019.
 S. Rai, S. M. Shetty and P. Rai. “Sentiment Analysis of Movie Reviews using Machine Learning Classifiers”. International Journal of Computer Applications, vol. 182, no. 50, pp. 25-28, 2019.
 S. Naz, A. Sharan and N. Malik. “Sentiment Classification on Twitter Data Using Support Vector Machine”. Proceeding 2018 IEEE/WIC/ ACM International Confernce Web Intell. WI 2018, pp. 676-679, 2019.
 R. Wagh and P. Punde. “Survey on Sentiment Analysis Using Twitter Dataset”. Proceeding 2nd International Conference electronic communications Aerospace Technology ICECA 2018, No. Iceca, pp. 208-211, 2018.
 N. Iqbal, A. M. Chowdhury and T. Ahsan. “Enhancing the Performance of Sentiment Analysis by Using Different Feature Combinations”. International Conference Compututing Communication IC4ME2 2018, pp. 1-4, 2018.
 A. Rane and A. Kumar. “Sentiment Classification System of Twitter Data for US Airline Service Analysis”. Proceeding International Computing Software APPL Conference, vol. 1, pp. 769-773, 2018.
 A. Jovi, K. Brki and N. Bogunovi. “A Review of Feature Selection Methods with Applications”. 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, pp. 25-29, 2015.
 S. Rana and A. Singh. “Comparative Analysis of Sentiment Orientation Using SVM and Naive Bayes Techniques”. Proceeding 2016 2nd Interenational Confernce Next General Computer Technologies, 2016, pp. 106-111, 2017.
 K. L. S. Kumar, J. Desai and J. Majumdar. “Opinion Mining and Sentiment Analysis on Online Customer Review”. 2016 IEEE Interenatioanl Conference Computing Intelligence computing Research ICCIC 2016, 2017.
 F. Iqbal, J. Maqbool, B. C. M. Fung, R. Batool, A. M. Khaytak, S. Aleem and P. C. K. Hung. “A hybrid framework for sentiment analysis using genetic algorithm based feature reduction”. IEEE Access, vol. 7, pp. 14637-14652, 2019.
 A. Zarshenas and K. Suzuki. “Binary coordinate ascent: An efficient optimization technique for feature subset selection for machine learning”. Knowledge-Based Systems, vol. 110, pp. 191- 201, 2016.