##plugins.themes.bootstrap3.article.main##

Nilam Deepak Padwal , Dr. Kamal Alaskar

Abstract

The surge of technology and the Internet globally has led to the ubiquitous use of social media. Opinion mining facilitates a crucial role in analyzing the context of text by recognizing the sentiment. Twitter has become a popular platform for analyzing sentiment,and numerous studies have focused on textual sentiment analysis and opinion mining of Twitter data. The deep neural network has become a vital solution in addition to traditional big data technologies to handle the abundance of social data evolving in social networks.Recently, deep learning models have proved promising results in opinion mining in handling sequences of arbitrary data lengths.Despite the impressive outcomes observed in the earlier studies, these approaches face difficulty in analyzingthe opinions of the users due to linguistic and grammatical errors, lack of aspect-level consideration of the words or phrases, ignoring the effect of emojis and emoticons in text and vanishing gradient problem in transformer models. In addition, human-generated language comprises various noises, which misleads the opinion-mining decision-making in a big data environment. Hence, this work aims to developa novel big data decision-making strategy for Twitter data involving denoising, pretrained model-assisted aspect-level feature extraction, and deep neural network-aided opinion classification. Several fundamental natural language preprocessing tasks are applied to improve the tweets' quality, particularly the emoji lexicon analysis that converts all emojis and emoticons into textual content to improvethe polarity recognition performance. Subsequently, a denoising strategy in the proposed feature extraction phase plays a significant role in removing unwanted entities through key entity recognitionandhandlingword ambiguity with the assistance of semantic knowledge sources. By providing the denoised texts as input to the RoBERTa model, the proposed approach generates the embedding representation with the aspect-level scorefor the input tweet. The aspect-level tweet embeddings are fed into Bi-LSTM with a self-attention mechanism to classify the tweets into positive and negative classes. Thus, the experimental results illustrate that the proposed RoBERTa-self-attention-based Bi-LSTM model was evaluated on two widely used opinion mining datasets: Coronavirus Tweets and Sentiment140, obtaining 95.51% and 94.05% classification accuracy.

##plugins.themes.bootstrap3.article.details##