Document in the subject Computer Sciences - Artificial Intelligence, , language: English, abstract: In today scenario there is abrupt usage of microblogging sites such as Twitter for sharing of feelings and emotions towards any current hot topic, any product, services, or any event. Such opinionated data needs to be leveraged effectively to get valuable insight from that data. This research work focused on designing a comprehensive feature-based Twitter Sentiment Analysis (TSA) framework using the supervised machine learning approach with integrated sophisticated negation handling approach and knowledge-based Tweet Normalization System (TNS). We generated three real-time twitter datasets using search operators such as #Demonetization, #Lockdown, and #9pm9minutes and also used one publically available benchmark dataset SemEval-2013 to assess the viability of our comprehensive feature-based twitter sentiment analysis system on tweets. We leveraged varieties of features such as lexicon-based features, pos-based, morphological, ngrams, negation, and cluster-based features to ascertain which classifier works well with which feature group. We employed three state-of-the-art classifiers including Support Vector Machine (SVM), Decision Tree Classifier (DTC), and Naive Bayesian (NB) for our twitter sentiment analysis framework. We observed SVM to be the best performing classifier across all the twitter datasets except #9pm9minutes (DTC turned out to be the best for this dataset). Moreover, our SVM model trained on the SemEval-2013 training dataset outperformed the winning team NRC Canada of SemEval- 2013 task 2 in terms of macro-averaged F1 score, averaged on positive and negative classes only. Though state-of-the-art twitter sentiment analysis systems reported significant performance, it is still challenging to deal with some critical aspects such as negation and tweet normalization.
Dieser Download kann aus rechtlichen Gründen nur mit Rechnungsadresse in A, B, BG, CY, CZ, D, DK, EW, E, FIN, F, GR, HR, H, IRL, I, LT, L, LR, M, NL, PL, P, R, S, SLO, SK ausgeliefert werden.