Every movie buff has a list of favorite movies bookmarked in their memory. Have you ever wondered what other people think about your favorite movie? If you’ve asked yourself that, you should definitely consider analyzing Twitter tweets about the movie. One of the latest movies I have seen is an Indian movie called ‘Rocketry’. I absolutely loved it. Here’s some analysis done using the Twitter API and some interesting Python libraries for the movie ‘Rocketry’.
Disclaimer: All results and text you see in the output are not my individual opinion. I’m just extracting data.
Note: In some cases, I have imported the libraries exactly where the library is intended to be used. This is for ease of understanding.
Prerequisite: refer to Twitter API Documentation | Documents | Twitter Developer Platform to configure your API keys.
Note: If you get a “Read-only application cannot post.” error, do the following –
1. Go to http://dev.twitter.com/apps and login
2. On the Settings tab, change the app type to Read, Write, and Direct Messages
3. On the Reset keys tab, press the Reset button, update the consumer key and secret in your application accordingly.
If you get the error message “ValueError: Object or value expected”, it may mean that your JSON is empty. To avoid this, a special handling condition has been added – os.path.getsize(path1+”\\”+file) > 0
Which tweet has the most retweets? (Top 10)
As you can see below, duplicates are being removed using the ‘clean_tweet’ column. You may wonder why this is so. This is because there are times when the tweet has exactly the same text, but the tweet has a URL that is in a different format. As an example, let’s assume we have a tweet ‘Rocketry: The Nambi Effect’\n\nhttps://t.co/lxpRjAONFa
Let’s say there’s another tweet with the same text, but a string in the URL is different – ‘Rocketry: The Nambi Effect’\n\nhttps://t.co/4C8KGt1k88. drop_duplicates will consider these two rows as unique rows. However, clean_tweet removes punctuation and URL strings and is thus more effective at removing duplicates.
The above question will be answered by Sentiment Analysis. As you may have seen, Sentiment Analysis is explained in many blogs and tutorials and used in many different ways. There is a plethora of libraries for Sentiment Analysis and each one has its own algorithm and corresponding accuracy. By using different sentiment analysis libraries, I have found that often times, using different libraries and creating an ensemble is more effective than using individual sentiment analysis libraries. However, in this case, I will continue with just one library in the interest of time.
In the following statement, a new column named ‘Final Sense’ is created using the polarity values obtained in the previous step.
tweet_counts = final_df['Final Sentiment'].value_counts()
tweet_counts
final_df['Final Sentiment'].value_counts(sort=True).nlargest(10).plot.bar()
pd.set_option ('display.max_colwidth', 3)
happy_df = final_df[(final_df['Final Sentiment']=='Happy/Enthralled')]
happy_df['tweet'].head(n=10)
sad_df = final_df[(final_df[‘Final Sentiment’]==’Sad/Pensive’)]
sad_df[‘tweet’].head(n=10)
Mixed_Emotions_df = final_df[(final_df['Final Sentiment']=='Mixed Emotions')]
Mixed_Emotions_df['tweet'].head(n=10)
An interesting and powerful library that can be used for text that has hashtags and emoticons is ‘Advertools’. Below, we use the powerful and fun ‘Advertools’ modules to uncover some pretty interesting insights.
What are some emoticons (Emoji) used in Tweets?
import advertools as adv
emoji_summary = adv.extract_emoji(final_df['tweet'])
Emojis_used_in_tweets = pd.DataFrame.from_dict(emoji_summary['top_emoji'])pd.set_option ('display.width', 100)
Emojis_used_in_tweets.columns = ['Emoticon', 'Count']
Emojis_used_in_tweets.style.set_table_attributes('style="font-size: 50px"')
Emojis_used_in_tweets[Emojis_used_in_tweets['Count']>35] #Only display emoticons that have a count greater than 35
pd.set_option ('display.width', 200)
hashtag_summary = adv.extract_hashtags(final_df['tweet'])
Hashtag_used_in_tweets = pd.DataFrame(hashtag_summary['top_hashtags'])
Hashtag_used_in_tweets.columns = ['Emoticon', 'Count']
Hashtag_used_in_tweets.style.set_table_attributes('style="font-size: 11px"')
Hashtag_used_in_tweets[Hashtag_used_in_tweets['Count']>35] #Only display hashtags that have a count greater than 35
pd.set_option (‘display.width’, 200)
mentions_summary = adv.extract_mentions(final_df[‘tweet’])
Top_Mentions_in_tweets = pd.DataFrame(mentions_summary[‘top_mentions’])
Top_Mentions_in_tweets.columns = [‘Mention’, ‘Count’]
Top_Mentions_in_tweets.style.set_table_attributes(‘style=”font-size: 11px”’)
Top_Mentions_in_tweets[Top_Mentions_in_tweets[‘Count’]>30] #Only display mentions that have a count greater than 30
A look at the Summary of Mentions
Below are the number of posts, number of mentions, mentions per post and unique mentions.
mention_summary = adv.extract_mentions(final_df['tweet'])
mention_summary['overview']
What are some questions that have been tweeted?
question_summary = adv.extract_questions(final_df['tweet'])
final_questions_list = [q for q in question_summary['question_text'] if q!= []]
questions_posed_in_tweets = pd.DataFrame()
for question in final_questions_list:
questions_posed_in_tweets = questions_posed_in_tweets.append(question)
questions_posed_in_tweets
questions_posed_in_tweets.columns = ['Question'] # Rename column
questions_posed_in_tweets = questions_posed_in_tweets.drop_duplicates(keep='first') # After removing duplicates, there are 3220 rows
with pd.option_context('display.max_rows', None,):
print(questions_posed_in_tweets.to_markdown)
What are some catchphrases that have been tweeted?
exclamations_summary = adv.extract_exclamations(final_df['tweet'])
exclamations_list = [q for q in exclamations_summary['exclamation_text'] if q!= []]
exclamations_posed_in_tweets = pd.DataFrame()
for exclamation in exclamations_list:
exclamations_posed_in_tweets = exclamations_posed_in_tweets.append(exclamation)
exclamations_posed_in_tweets
exclamations_posed_in_tweets.columns = ['Exclamation'] # Rename column
exclamations_posed_in_tweets = exclamations_posed_in_tweets.drop_duplicates(keep='first') # After removing duplicates, there are 3220 rows
with pd.option_context('display.max_rows', None,):
print(exclamations_posed_in_tweets.to_markdown)
Twitter analytics has a multitude of uses. It can be used just for fun or for a very specific purpose, say to improve your business. Let’s say you have a business (product/service) that you promote using Twitter. It really helps to mine and analyze reviews to better understand your customers. It will also serve to understand the shortcomings of your product and thereby provide actionable knowledge to improve the quality of your product.