Twitter Hashtag Sentiment Analysis
Twitter is what's happening and what people are talking about right now. It is a global platform for public self-expression and conversation in real-time. It provides a network that connects users to people, information, ideas, opinions, and news.
This project provides different functions to analyze tweets and segregate them on the basis of the sentiment of each tweet. We can search for any Twitter hashtag and any number of tweets. Using this we can get an overview of people’s reactions to many topics and events. It can also help Data Analysts for getting information out of a particular hashtag.
The language used for creating this project in Python.
Introduction:
Sentiment analysis is a natural language processing technique that involves the use of algorithms and machine learning models to identify, extract, and classify subjective information from text, such as opinions, attitudes, emotions, and feelings. It is used to analyze the sentiment or tone of a piece of text, typically by categorizing it as positive, negative, or neutral.
Sentiment analysis is commonly used in various applications, including customer feedback analysis, brand reputation management, social media monitoring, political analysis and market research.
Overall, sentiment analysis is a valuable tool for businesses, researchers, and individuals who want to gain insights into how people feel and think about a particular topic or brand.
On Twitter, a hashtag is a word or phrase preceded by the hash symbol (#) that is used to identify a specific topic or theme. When a user includes a hashtag in their tweet, it makes it easier for others to discover and join in on the conversation about that topic. Users can click on a hashtag to view all tweets that include that specific hashtag, even if they don't follow the user who tweeted it.
Hashtags can also be used to participate in or follow specific events, campaigns, or social movements on Twitter. They can help to increase the reach and visibility of a tweet and make it more likely to be seen by a larger audience.
Libraries Used:
nltk: The Natural Language Toolkit (NLTK) is a platform used for building Python programs that work with human language data for applying in statistical natural language processing (NLP). It contains text processing libraries for tokenization, parsing, classification, stemming, tagging and semantic reasoning.
snscrape: Snscrape is a scraper for social networking services (SNS). It scrapes things like user profiles, hashtags, or searches and returns the discovered items.
googletrans: Googletrans is a free and unlimited python library that implemented Google Translate API. This uses the Google Translate Ajax API to make calls to such methods as detect and translate.
string: String module contains some constants, utility function, and classes for string manipulation.
re: A Regular Expressions (RegEx) is a special sequence of characters that uses a search pattern to find a string or set of strings.
wordcloud: Word Cloud is a data visualization technique used for representing text data in which the size of each word indicates its frequency or importance.
matplotib: Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
tkinter: Tkinter is the standard GUI library for Python. Python when combined with Tkinter provides a fast and easy way to create GUI applications.
Code(sentimentAnalysis.py):
import snscrape.modules.twitter as sntwitter
from googletrans import Translator
import string
import re
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.sentiment.vader import SentimentIntensityAnalyzer
def stop_word_removal(sent):
stop_words = set(stopwords.words("english"))
word_tokens = word_tokenize(sent)
swr = [word for word in word_tokens if word.lower() not in stop_words]
clear = ' '.join(swr)
return clear
# Extracting Tweets and translating them in English
def orig_tweets(hashtag, num):
tweets = []
translator = Translator()
for tweet in sntwitter.TwitterSearchScraper(hashtag).get_items():
if len(tweets) == num:
break
else:
translated = translator.translate(tweet.rawContent, dest="en")
tweets.append(translated.text)
return tweets
# Cleaning the extracted tweets
def clean_tweets(tweets):
cleared = []
for i in range(len(tweets)):
sent = tweets[i]
lower_case = sent.lower()
lower_case = stop_word_removal(lower_case)
cleaning = lower_case.translate(str.maketrans('', '', string.punctuation))
cleaning = re.sub(r'@[A-Za-z0-9]+', '', cleaning)
cleaning = re.sub(r'#', '', cleaning)
cleaning = re.sub(r'RT[\s]+', '', cleaning)
cleaning = re.sub(r'https?:\/\/\S+', '', cleaning)
cleared.append(cleaning)
return cleared
# Performing sentiment analysis on the cleaned Tweets
def senti(ctweets):
sent_analyzer = SentimentIntensityAnalyzer()
sentimentsList = []
for text in ctweets:
analysis = sent_analyzer.polarity_scores(text)
sentiment = analysis["compound"]
sentimentsList.append(sentiment)
return sentimentsList
# Segregating tweets into Positive, Negative and Neutral
def segregate_tweets(sen, tweets):
positive = []
posiPol = []
negative = []
negPol = []
neutral = []
neutPol = []
for i in range(len(sen)):
if sen[i] >= 0.5:
posiPol.append(sen[i])
positive.append(tweets[i])
elif sen[i] <= -0.5:
negPol.append(sen[i])
negative.append(tweets[i])
else:
neutPol.append(sen[i])
neutral.append(tweets[i])
return positive, posiPol, negative, negPol, neutral, neutPol
# For plotting a bar graph
def bar_graph(p, ne, n, h):
values = [p, ne, n]
attributes = ["Positive", "Neutral", "Negative"]
plt.figure(figsize=(10, 5))
plt.bar(attributes, values, color='maroon', width=0.4)
plt.title("Bar Graph of #" + h)
plt.xlabel("Emotions")
plt.ylabel("Number of Tweets")
plt.show()
# For plotting a pie chart
def pie_chart(p, ne, n, h):
values = [p, ne, n]
attributes = ["Positive", "Neutral", "Negative"]
plt.title("Pie distribution of #" + h)
plt.pie(values, labels=attributes, autopct='%2.1f%%')
plt.show()
# For creating a word cloud
def word_cloud(text):
wordcloud = WordCloud(width=800, height=400, random_state=21, max_font_size=110).generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()
Code(analysisWindow.py)
from tkinter import *
from tkinter import messagebox
import sentimentAnalysis as sa
window = Tk()
window.title("Twitter Sentiment Analysis")
window.geometry('530x250')
positive_tweets = []
negative_tweets = []
neutral_tweets = []
all_words = ""
hashtag_label = Label(window, text="Enter a Twitter Hashtag", fg="blue", font=("Arial", 16))
hashtag_label.grid(row=0, column=0, padx=10, pady=10)
hashtag = Entry(window, background="white", fg="black", font=("Arial", 16))
hashtag.grid(row=0, column=1, padx=10, pady=10)
num_label = Label(window, text="Number of Tweets", fg="blue", font=("Arial", 16))
num_label.grid(row=1, column=0, padx=10, pady=10)
tweet_num = Entry(window, background="white", fg="black", font=("Arial", 16))
tweet_num.grid(row=1, column=1, padx=10, pady=10)
def analyzing():
try:
global all_words, positive_tweets, neutral_tweets, negative_tweets
hash_text = hashtag.get()
num = int(tweet_num.get())
tweets = sa.orig_tweets(hash_text, num)
cleaned_tweets = sa.clean_tweets(tweets)
all_words = ' '.join(cleaned_tweets)
polarity = sa.senti(cleaned_tweets)
positive_tweets, positive_polarity, negative_tweets, negative_polarity, neutral_tweets, neutral_polarity = sa.segregate_tweets(polarity, tweets)
sa.bar_graph(len(positive_tweets), len(neutral_tweets), len(negative_tweets), hash_text)
sa.pie_chart(len(positive_tweets), len(neutral_tweets), len(negative_tweets), hash_text)
except:
messagebox.showerror("Error", "Wrong Input")
analyze = Button(window, text="Analyze", fg="red", background="yellow", font=("Arial", 16), command=analyzing)
analyze.grid(row=3, column=1, padx=10, pady=10)
def show_tweets():
win = Tk()
win.title("Tweets")
win.geometry('1000x650')
main_frame = Frame(win)
main_frame.pack(fill=BOTH, expand=1)
canvas = Canvas(main_frame)
canvas.pack(side=LEFT, fill=BOTH, expand=1)
scbar = Scrollbar(main_frame, orient=VERTICAL, command=canvas.yview)
scbar.pack(side=RIGHT, fill=Y)
canvas.configure(yscrollcommand=scbar.set)
canvas.bind('<Configure>', lambda e: canvas.configure(scrollregion=canvas.bbox("all")))
second_frame = Frame(canvas)
canvas.create_window((0, 0), window=second_frame, anchor="nw")
l1 = "Positive Tweets(" + str(len(positive_tweets)) + ")"
l2 = "Negative Tweets(" + str(len(negative_tweets)) + ")"
l3 = "Neutral Tweets(" + str(len(neutral_tweets)) + ")"
posi_list = "Tweet--> " + '\nTweet--> '.join(positive_tweets)
neg_list = "Tweet--> " + '\nTweet--> '.join(negative_tweets)
neu_list = "Tweet--> " + '\nTweet--> '.join(neutral_tweets)
p_label = Label(second_frame, text=l1, fg="green", font=("Arial", 18), background="yellow")
p_label.grid(row=0, column=0, padx=10, pady=10)
p_tweets = Text(second_frame, fg="black", font=("Arial", 16), background="white")
p_tweets.grid(row=1, column=0, padx=10, pady=10)
p_tweets.insert("1.0", posi_list)
n_label = Label(second_frame, text=l2, fg="red", font=("Arial", 18), background="black")
n_label.grid(row=2, column=0, padx=10, pady=10)
n_tweets = Text(second_frame, fg="black", font=("Arial", 16), background="white")
n_tweets.grid(row=3, column=0, padx=10, pady=10)
n_tweets.insert("1.0", neg_list)
nu_label = Label(second_frame, text=l3, fg="blue", font=("Arial", 18), background="lightblue")
nu_label.grid(row=4, column=0, padx=10, pady=10)
nu_tweets = Text(second_frame, fg="black", font=("Arial", 16), background="white")
nu_tweets.grid(row=5, column=0, padx=10, pady=10)
nu_tweets.insert("1.0", neu_list)
see_tweets = Button(window, text="Show Tweets", fg="black", background="green", font=("Arial", 16), command=show_tweets)
see_tweets.grid(row=4, column=0, padx=10, pady=10)
def show_word_cloud():
if len(all_words) > 0:
sa.word_cloud(all_words)
else:
messagebox.showwarning("Warning", "Analyze a hashtag first")
wordCloud = Button(window, text="Word Cloud", fg="black", background="lightblue", font=("Arial", 16), command=show_word_cloud)
wordCloud.grid(row=4, column=1, padx=10, pady=10)
window.mainloop()
Output:
After entering the hashtag and the number press analyze.
Right after the analyzing process is finished the bar graph and pie chart is presented.
Conclusion:
The code performs well and deliver the desired output. If the data is large the speed of process will be slow. Overall, it is fit for conducting sentiment analysis at a small level.
No comments: