Analyzing Tweets from Nigerian Airline Passengers¶
This is a notebook by Ogechi Anoliefo.
1. Import Libraries ¶
## IMPORTING THE LIBRARIES TO BE USED IN THIS PROJECT
import pandas as pd #primary data structure library
import numpy as np #for working with arrays and carrying out mathematical operations
import requests #for making HTTP requests
import json #for encoding and decoding json data
from collections import Counter #for counting
import glob #to find files/paths that match a specified pattern
import os #for interacting with the operating system
#for scraping tweets from X
import asyncio
import twscrape
from twscrape import API, gather
from twscrape.logger import set_log_level
#for processing textual data
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.stem import WordNetLemmatizer
from nltk.probability import FreqDist
from textblob import TextBlob
import re
import string
import sys
from unicodedata import category
import demoji
#for creating visualizations
import matplotlib.pyplot as plt
import plotly.graph_objs as go
import plotly.express as px
import plotly.io as pio
from wordcloud import WordCloud
from PIL import Image
import folium
import random
import kaleido
#needed to display my plotly chart in my website/blog post
import chart_studio
username = "xxxx"
api_key = "xxxxxxxxx"
chart_studio.tools.set_credentials_file(username=username, api_key=api_key)
import chart_studio.plotly as py
import chart_studio.tools as tls
#for location geocoding
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
2. Tweets Mining ¶
To mine the tweets, I made use of the Python library twscrape. The library works through an authorised API, so you will need X account(s) in order to use it. You need your X username and password, and also the email associated with your X account and its password to be able to mine tweets.
The library is designed to automatically switch accounts when the X API limit has been reached per 15-minute interval. So you can add multiple accounts to the API pool in order to change to a different account and continue scraping when the other accounts have reached their API limits. I used two accounts in order to make scraping faster and stored the mined tweets in csv files.
If you would like to read more about twscrape, and how you can use it to scrape tweets, you can check out their official documentations here and here.
The code block below illustrates how to add the required credentials. @username, password, email and email_pass represent your X account username, your X account password, the email associated with your X account and its password respectively.
#credentials required to use twscrape
api = API() #create an instance of the twscrape API object
await api.pool.add_account("@username", "password", "email", "email_pass") #add X account and email credentials
await api.pool.add_account("@username", "password", "email", "email_pass") #add X account and email credentials
await api.pool.login_all() #logs in to all the accounts provided
For my analysis, I focused on six Nigerian airlines, namely: Aero Contractors, Air Peace, Dana Air, Ibom Air, Max Air and United Nigeria Airline. I wanted to mine tweets which were created between 1st and 28th December 2023, mentioning any of the 6 airlines or their X handles. I created a function which would take in the search query, airline name and the directory to save the output on my desktop, and return a dataframe of the scraped tweets.
You will notice that my function is defined using async def. This is because twscrape uses a coroutine function to scrape tweets, so using the regular def for defining regular python functions will not work. ie. async def is used to define coroutine functions in python.
To get the result from the scrape_tweets function, I had to use the await expression. This will get the result of the coroutine object that the scrape_tweets function returns. Note that calling the function like a regular python function: scrape_tweets(xxxxx, xxxxx, xxxxxxx) will not apply the function or output the result. You would only get a message indicating that a coroutine object has been created.
#FUNCTION TO SCRAPE TWEETS AND SAVE TO A DIRECTORY ON THE SYSTEM
async def scrape_tweets(search_query, airline, save_to_file):
data = [] #create an empty list to be used to store the search results
#define the search query. Include start date and end date
q = search_query + "since:2023-12-01 until:2023-12-29"
async for tweet in api.search(q, limit=50000): #iterate over the search results
c = [tweet.id, tweet.date, tweet.rawContent, tweet.likeCount, tweet.retweetCount, tweet.user.location] #list of attributes to return
data.append(c) #add each new list of attributes to 'data'
df = pd.DataFrame(data, columns=['Tweet_ID', 'Time_Created', 'Text', 'Likes', 'Retweets', 'Location']) #convert the list to a dataframe
df['Airline'] = airline #add a new column 'Airline' to the dataframe to specify the airline whose tweets have been returned
df.to_csv(save_to_file, index = False) #save to a chosen directory on the computer
return df
#an example of how the scrape_tweets function works
tweets = await scrape_tweets("danaair OR @danaair OR #danaair", "Dana Air", "Airlines\Dana_Air_X.csv")
tweets
Tweet_ID | Time_Created | Text | Likes | Retweets | Location | Airline | |
---|---|---|---|---|---|---|---|
0 | 1730451868705341651 | 2023-12-01 05:00:50 | @NosaMUgiagbe @DanaAir Blood of Jesus. Brother... | 0 | 0 | Dana Air | |
1 | 1730482961307316428 | 2023-12-01 07:04:23 | @theakuko @DanaAir Ok thank you | 1 | 0 | Dana Air | |
2 | 1730486857673171328 | 2023-12-01 07:19:52 | It’s been a month @DanaAir cancelled my flight... | 1 | 0 | Nigeria | Dana Air |
3 | 1730487140285333798 | 2023-12-01 07:20:59 | @Mzwayne007 @DanaAir my details are in the com... | 0 | 0 | Nigeria | Dana Air |
4 | 1730487624656204039 | 2023-12-01 07:22:55 | @DanaAir here’s my details. Give me my refund.... | 0 | 0 | Nigeria | Dana Air |
... | ... | ... | ... | ... | ... | ... | ... |
823 | 1740470423220084880 | 2023-12-28 20:30:59 | @BONILENLA_ @DanaAir @fkeyamo @FAAN_Official O... | 0 | 0 | Abuja, Nigeria | Dana Air |
824 | 1740476099501650322 | 2023-12-28 20:53:33 | @HeemOnWheels @flyunitedng @DanaAir 118,000,00... | 1 | 0 | Dana Air | |
825 | 1740480707523424347 | 2023-12-28 21:11:51 | @HeemOnWheels @flyunitedng @DanaAir Today is d... | 1 | 0 | Konoha, Leaf Village | Dana Air |
826 | 1740486570321514699 | 2023-12-28 21:35:09 | @DanaAir This airline needs to be thrown out o... | 0 | 0 | Peacefull | Dana Air |
827 | 1740510825939644559 | 2023-12-28 23:11:32 | @HeemOnWheels @NOgechukwu @flyunitedng @DanaAi... | 2 | 0 | Lagos, Nigeria | Dana Air |
828 rows × 7 columns
I used the function to scrape tweets for each airline, and store them in csv files. The commands were as follows:
- await scrape_tweets("aerocontractors OR @flyaero OR #flyaero", "Aero Contractors", "Airlines\Aero.csv") ----for Aero Contractors
- await scrape_tweets("airpeace OR @flyairpeace OR #airpeace", "Air Peace", "Airlines\Air_Peace.csv") ----for Air Peace
- await scrape_tweets("danaair OR @danaair OR #danaair", "Dana Air", "Airlines\Dana_Air.csv") ----for Dana Air
- await scrape_tweets("ibomair OR @ibomairlines OR #ibomair", "Ibom Air", "Airlines\Ibom_Air.csv") ----for Ibom Air
- await scrape_tweets("maxair OR @maxairltd OR #maxair", "Max Air", "Airlines\Max_Air.csv") ----for Max Air
- await scrape_tweets("unitednigeriaairlines OR @flyunitedng OR #unitednigeriaairlines", "United Nigeria Airline", "Airlines\United_Nigeria_Airline.csv") ----for United Nigeria Airline
Combining all tweets into one Dataframe¶
#get a list of all CSV files in the folder
folder = "Airlines"
files = glob.glob(os.path.join(folder, "*.csv"))
#read each file into a dataframe and store them in a list
dfs = []
for file in files:
data = pd.read_csv(file)
dfs.append(data)
#merge the dataframes
df = pd.concat(dfs, axis=0, ignore_index=True)
df.head()
Tweet_ID | Time_Created | Text | Likes | Retweets | Location | Airline | |
---|---|---|---|---|---|---|---|
0 | 1730498287172493610 | 2023-12-01 08:05:17 | @flyaero Please refund my money since August I... | 0 | 0 | NaN | Aero Contractors |
1 | 1730517986950127815 | 2023-12-01 09:23:34 | @flyaero Good morning \n\nI booked a flight ye... | 0 | 0 | Abuja, Nigeria | Aero Contractors |
2 | 1730518283634274364 | 2023-12-01 09:24:44 | @flyaero \nI would greatly appreciate it if yo... | 0 | 0 | Abuja, Nigeria | Aero Contractors |
3 | 1730602557116985433 | 2023-12-01 14:59:37 | @flyaero I booked for a flight and ticket was ... | 0 | 0 | Nigeria | Aero Contractors |
4 | 1730614023979147744 | 2023-12-01 15:45:11 | @TimsyMera @flyaero Did they sort this ? | 0 | 0 | On The Wheels!! | Aero Contractors |
df.shape
(5594, 7)
3. Data Cleaning ¶
Removing duplicates¶
The first data cleaning step I took was to check for, and take out duplicates in the dataset. Every tweet on X has a unique tweet ID, making tweet_id the primary key column for this dataset. Using the line of code below, I checked the tweet_id column to see if there were any duplicates.
#checking for duplicates
df.duplicated(subset=['Tweet_ID']).sum()
220
There were 220 duplicate entries in the dataset. I used the lines of code below to drop the duplicates, and view the resulting dataset.
#drop duplicates
df1 = df.drop_duplicates(subset=['Tweet_ID'], keep='first').reset_index(drop=True)
df1.shape
(5374, 7)
After removing the duplicates, there were 5374 entries left in the dataset.
Handling missing values¶
#return the number of missing values in each column of the dataset
df1.isnull().sum()
Tweet_ID 0 Time_Created 0 Text 0 Likes 0 Retweets 0 Location 1439 Airline 0 dtype: int64
The Location column of the dataset contained 1439 missing values. This large number of missing values was expected as only a handful of X users include their location in their bio. I replaced the missing values in this column with '---', because if used for location geocoding, 'NaN' values would return coordinates, which should not be.
#replace NaN values in 'Location' column with '---'
df1['Location'] = df1['Location'].fillna('---')
df1.head()
Tweet_ID | Time_Created | Text | Likes | Retweets | Location | Airline | |
---|---|---|---|---|---|---|---|
0 | 1730498287172493610 | 2023-12-01 08:05:17 | @flyaero Please refund my money since August I... | 0 | 0 | --- | Aero Contractors |
1 | 1730517986950127815 | 2023-12-01 09:23:34 | @flyaero Good morning \n\nI booked a flight ye... | 0 | 0 | Abuja, Nigeria | Aero Contractors |
2 | 1730518283634274364 | 2023-12-01 09:24:44 | @flyaero \nI would greatly appreciate it if yo... | 0 | 0 | Abuja, Nigeria | Aero Contractors |
3 | 1730602557116985433 | 2023-12-01 14:59:37 | @flyaero I booked for a flight and ticket was ... | 0 | 0 | Nigeria | Aero Contractors |
4 | 1730614023979147744 | 2023-12-01 15:45:11 | @TimsyMera @flyaero Did they sort this ? | 0 | 0 | On The Wheels!! | Aero Contractors |
#confirm there are no more missing values in the dataset
df1.isnull().sum()
Tweet_ID 0 Time_Created 0 Text 0 Likes 0 Retweets 0 Location 0 Airline 0 dtype: int64
Changing some column types¶
Here I looked at the column types to ensure they were in the appropriate format. I changed Time_Created column to datetime format.
#checking column types
df1.dtypes
Tweet_ID int64 Time_Created object Text object Likes int64 Retweets int64 Location object Airline object dtype: object
#to change 'Time_Created' column to datetime format and confirm it's been changed
df1['Time_Created'] = pd.to_datetime(df1['Time_Created'])
df1.dtypes
Tweet_ID int64 Time_Created datetime64[ns] Text object Likes int64 Retweets int64 Location object Airline object dtype: object
#checking to see the number of tweets/mentions per airline
df1.Airline.value_counts()
Airline Air Peace 2490 United Nigeria Airline 829 Ibom Air 828 Dana Air 811 Aero Contractors 257 Max Air 159 Name: count, dtype: int64
4. Sentiment Analysis ¶
To perform sentiment analysis, I used the python library TextBlob. I created a function get_sentiment to return the sentiment category (ie. negative, positive or neutral) for each tweet based on the TextBlob polarity score. Note that the polarity score is a float within the range -1.0 to 1.0, where:
--values between -1.0 and 0.0 indicate a negative sentiment
--values between 0.0 and 1.0 indicate a positive sentiment
--a value of 0.0 indicates a neutral sentiment
#function to return sentiment category
def get_sentiment(tweet):
polarity = TextBlob(tweet).sentiment.polarity
if polarity > 0:
return "Positive"
elif polarity < 0:
return "Negative"
else:
return "Neutral"
#testing the get_sentiment function on some randomly selected tweets
texts = ("@flyaero didn’t disappoint me, affordable ticket with excellent customer service 🥳🥳🥳",
"Hello @DanaAir please check your DM asap! \n\nWhat’s with the cancelling of flight 🤷🏾♀️🤷🏾♀️🤷🏾♀️🤷🏾♀️ \n\nCc: @officialomoba",
"@flyunitedng That’s disrespectful of you to say, na everybody dey travel like a peasant?")
for text in texts:
print(get_sentiment(text))
Positive Neutral Negative
#applying the get_sentiment function to each entry in 'Text' column and storing the result in a new column 'Sentiment'
df1['Sentiment'] = df1['Text'].apply(get_sentiment)
df1.head()
Tweet_ID | Time_Created | Text | Likes | Retweets | Location | Airline | Sentiment | |
---|---|---|---|---|---|---|---|---|
0 | 1730498287172493610 | 2023-12-01 08:05:17 | @flyaero Please refund my money since August I... | 0 | 0 | --- | Aero Contractors | Negative |
1 | 1730517986950127815 | 2023-12-01 09:23:34 | @flyaero Good morning \n\nI booked a flight ye... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Neutral |
2 | 1730518283634274364 | 2023-12-01 09:24:44 | @flyaero \nI would greatly appreciate it if yo... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Positive |
3 | 1730602557116985433 | 2023-12-01 14:59:37 | @flyaero I booked for a flight and ticket was ... | 0 | 0 | Nigeria | Aero Contractors | Negative |
4 | 1730614023979147744 | 2023-12-01 15:45:11 | @TimsyMera @flyaero Did they sort this ? | 0 | 0 | On The Wheels!! | Aero Contractors | Neutral |
#to return the sentiment category count for each airline
airline_sentiment_count = df1.groupby(['Airline', 'Sentiment'])['Sentiment'].count()
airline_sentiment_count
Airline Sentiment Aero Contractors Negative 121 Neutral 69 Positive 67 Air Peace Negative 783 Neutral 958 Positive 749 Dana Air Negative 386 Neutral 282 Positive 143 Ibom Air Negative 181 Neutral 336 Positive 311 Max Air Negative 58 Neutral 58 Positive 43 United Nigeria Airline Negative 264 Neutral 321 Positive 244 Name: Sentiment, dtype: int64
#to return the percentage of each sentiment category per airline
airline_sentiment = pd.DataFrame(airline_sentiment_count)
airline_sentiment.columns = ['Sentiment_Count']
airline_sentiment = airline_sentiment.reset_index()
airline_sentiment['Percentage (%)'] = 100 * airline_sentiment['Sentiment_Count'] / airline_sentiment.groupby('Airline')['Sentiment_Count'].transform('sum')
airline_sentiment
Airline | Sentiment | Sentiment_Count | Percentage (%) | |
---|---|---|---|---|
0 | Aero Contractors | Negative | 121 | 47.081712 |
1 | Aero Contractors | Neutral | 69 | 26.848249 |
2 | Aero Contractors | Positive | 67 | 26.070039 |
3 | Air Peace | Negative | 783 | 31.445783 |
4 | Air Peace | Neutral | 958 | 38.473896 |
5 | Air Peace | Positive | 749 | 30.080321 |
6 | Dana Air | Negative | 386 | 47.595561 |
7 | Dana Air | Neutral | 282 | 34.771887 |
8 | Dana Air | Positive | 143 | 17.632552 |
9 | Ibom Air | Negative | 181 | 21.859903 |
10 | Ibom Air | Neutral | 336 | 40.579710 |
11 | Ibom Air | Positive | 311 | 37.560386 |
12 | Max Air | Negative | 58 | 36.477987 |
13 | Max Air | Neutral | 58 | 36.477987 |
14 | Max Air | Positive | 43 | 27.044025 |
15 | United Nigeria Airline | Negative | 264 | 31.845597 |
16 | United Nigeria Airline | Neutral | 321 | 38.721351 |
17 | United Nigeria Airline | Positive | 244 | 29.433052 |
#horizontal bar plot to show the percentage of each sentiment category per airline
fig = px.bar(airline_sentiment, y='Airline', x='Percentage (%)', facet_col='Sentiment', orientation='h', color='Airline', text='Percentage (%)')
fig.update_traces(texttemplate='%{text:.3s}%', textposition='inside')
fig.update_layout(title={'text':"Proportion of Sentiments per Airline", 'x':0.5}, showlegend=False)
fig.update_xaxes(tickvals=[0, 10, 20, 30, 40], ticktext=["0", "10%", "20%", "30%", "40%"])
fig.for_each_xaxis(lambda x: x.update(title = ''))
fig['layout']['xaxis2']['title']['text']='Percentage'
fig.show()
#export plot to chart studio which I will later embed in my blog post
py.plot(fig, filename="Proportion of Sentiments per Airline", auto_open = True)
'https://plotly.com/~oge/1/'
The sentiment category plot above shows that Ibom Air is the airline with the highest proportion of positive sentiments. It also happens to be the airline with the least proportion of negative sentiments.
5. Time Series Analysis ¶
In this section, the aim was to use pandas' resample() method to perform some time series analysis to look at the average number of mentions for the airlines per specified time window over the 28-day period. I also wanted to look at how the sentiments towards the airlines changed over the 28-day period. In order to avoid a crammed analysis, I decided to perform time series analysis for the airline with the most mentions (Air Peace) and the airline with the highest proportion of positive sentiments (Ibom Air). Note that to use pandas' resample() method to view time series data over time, the datetime column of your dataset has to be set as the index.
#set datetime column 'Time_Created' as index of the dataset
df1 = df1.set_index('Time_Created')
df1.head()
Tweet_ID | Text | Likes | Retweets | Location | Airline | Sentiment | |
---|---|---|---|---|---|---|---|
Time_Created | |||||||
2023-12-01 08:05:17 | 1730498287172493610 | @flyaero Please refund my money since August I... | 0 | 0 | --- | Aero Contractors | Negative |
2023-12-01 09:23:34 | 1730517986950127815 | @flyaero Good morning \n\nI booked a flight ye... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Neutral |
2023-12-01 09:24:44 | 1730518283634274364 | @flyaero \nI would greatly appreciate it if yo... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Positive |
2023-12-01 14:59:37 | 1730602557116985433 | @flyaero I booked for a flight and ticket was ... | 0 | 0 | Nigeria | Aero Contractors | Negative |
2023-12-01 15:45:11 | 1730614023979147744 | @TimsyMera @flyaero Did they sort this ? | 0 | 0 | On The Wheels!! | Aero Contractors | Neutral |
#resample the average mentions of Air Peace and Ibom Air per 6 hours
mean_airpeace = (df1['Airline'] == 'Air Peace').resample('6h').mean()
mean_ibomair = (df1['Airline'] == 'Ibom Air').resample('6h').mean()
#plotting the percentage mean mentions of Air Peace and Ibom Air per 6 hours over the 28-day period
fig = go.Figure(layout = go.Layout(width=1050, height=450))
fig.add_trace(go.Scatter(x=mean_airpeace.index, y=mean_airpeace*100, mode='lines', name='Air Peace', marker=dict(color='blue')))
fig.add_trace(go.Scatter(x=mean_ibomair.index, y=mean_ibomair*100, mode='lines', name='Ibom Air', marker=dict(color='green')))
fig.update_layout(title={'text':'Percentage of Tweets mentioning Air Peace versus Ibom Air per 6 hours', 'x':0.5, 'xanchor': 'center',
'yanchor': 'top'}, xaxis_title='Day', xaxis=dict(tickformat='%Y-%m-%d'), yaxis_title='Percentage of tweets (%)',
legend = dict(x=0.888, y =0.97), yaxis=dict( range=[-5, 110]), margin=dict(r=0, t=50, l=0, b=50))
fig.update_xaxes(range =['2023-11-30 16:00:00', '2023-12-29 08:00:00'], showgrid=True, ticks="outside", tickson="boundaries", ticklen=4)
fig.show()
#export plot to chart studio which I will later embed in my blog post
py.plot(fig, filename="Time Series Analysis - Tweet Mentions", auto_open = True)
'https://plotly.com/~oge/5/'
After viewing the average mentions of the above mentioned airlines per 6 hours, I wanted to plot the average sentiment scores for the airlines per 6 hours. To do this, I needed the actual sentiment scores, ie. the digits, and not the categories - positive, negative, neutral. To get the sentiment scores, I created a function which would return the TextBlob sentiment polarity score of each tweet.
#function to return the sentiment polarity score of each tweet
def get_sentiment_score(tweet):
blob = TextBlob(tweet)
polarity = blob.sentiment.polarity
return polarity
#testing the get_sentiment_score function on some randomly selected tweets
texts = ("@flyaero didn’t disappoint me, affordable ticket with excellent customer service 🥳🥳🥳",
"Hello @DanaAir please check your DM asap! \n\nWhat’s with the cancelling of flight 🤷🏾♀️🤷🏾♀️🤷🏾♀️🤷🏾♀️ \n\nCc: @officialomoba",
"@flyunitedng That’s disrespectful of you to say, na everybody dey travel like a peasant?")
for text in texts:
print(get_sentiment_score(text))
1.0 0.0 -0.6999999999999998
#applying the get_sentiment_score function to each entry in 'Text' column and storing the result in a new column 'Sentiment_Score'
df1['Sentiment_Score'] = df1['Text'].apply(get_sentiment_score)
df1.head()
Tweet_ID | Text | Likes | Retweets | Location | Airline | Sentiment | Sentiment_Score | |
---|---|---|---|---|---|---|---|---|
Time_Created | ||||||||
2023-12-01 08:05:17 | 1730498287172493610 | @flyaero Please refund my money since August I... | 0 | 0 | --- | Aero Contractors | Negative | -0.420625 |
2023-12-01 09:23:34 | 1730517986950127815 | @flyaero Good morning \n\nI booked a flight ye... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Neutral | 0.000000 |
2023-12-01 09:24:44 | 1730518283634274364 | @flyaero \nI would greatly appreciate it if yo... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Positive | 0.266667 |
2023-12-01 14:59:37 | 1730602557116985433 | @flyaero I booked for a flight and ticket was ... | 0 | 0 | Nigeria | Aero Contractors | Negative | -0.325000 |
2023-12-01 15:45:11 | 1730614023979147744 | @TimsyMera @flyaero Did they sort this ? | 0 | 0 | On The Wheels!! | Aero Contractors | Neutral | 0.000000 |
#resample the average sentiment scores of Air Peace and Ibom Air per 6 hours
sent_airpeace = df1.Sentiment_Score[df1['Airline'] == 'Air Peace'].resample('6h').mean()
sent_ibomair = df1.Sentiment_Score[df1['Airline'] == 'Ibom Air'].resample('6h').mean()
#plotting the average sentiment scores of Air Peace and Ibom Air per 6 hours over the 28-day period
fig = go.Figure(layout = go.Layout(width=1050, height=600))
fig.add_trace(go.Scatter(x=sent_airpeace.index, y=sent_airpeace, mode='lines', name='Air Peace', marker=dict(color='blue')))
fig.add_trace(go.Scatter(x=sent_ibomair.index, y=sent_ibomair, mode='lines', name='Ibom Air', marker=dict(color='green')))
fig.update_layout(title={'text':'Average Sentiment Scores per 6 hours for Air Peace and Ibom Air', 'x':0.5, 'xanchor': 'center',
'yanchor': 'top'}, xaxis_title='Day', xaxis=dict(tickformat='%Y-%m-%d'), yaxis_title='Average Sentiment Score',
legend = dict(x=0.879, y =0.966), yaxis=dict(range=[-0.73, 1.05]), margin=dict(r=0, t=50, l=0, b=50))
fig.update_xaxes(range =['2023-11-30 16:00:00', '2023-12-29 08:00:00'], showgrid=True, ticks="outside", tickson="boundaries", ticklen=4)
fig.show()
#export plot to chart studio which I will later embed in my blog post
py.plot(fig, filename="Time Series Analysis - Sentiment Scores", auto_open = True)
'https://plotly.com/~oge/205/'
Quick point to note: As seen from the average sentiment scores and average airline mentions plots above, even though Air Peace has much more mentions than Ibom Air, the sentiment towards Ibom Air is higher. This goes to show that with social media data, having a lot of mentions or a lot of buzz does not necessarily translate to likeness for a product or company or person, etc. A subject could trend for the wrong reasons.
Once I was done with time series analysis, I reset the dataset index
df1.reset_index(inplace=True)
df1.head(3)
Time_Created | Tweet_ID | Text | Likes | Retweets | Location | Airline | Sentiment | Sentiment_Score | |
---|---|---|---|---|---|---|---|---|---|
0 | 2023-12-01 08:05:17 | 1730498287172493610 | @flyaero Please refund my money since August I... | 0 | 0 | --- | Aero Contractors | Negative | -0.420625 |
1 | 2023-12-01 09:23:34 | 1730517986950127815 | @flyaero Good morning \n\nI booked a flight ye... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Neutral | 0.000000 |
2 | 2023-12-01 09:24:44 | 1730518283634274364 | @flyaero \nI would greatly appreciate it if yo... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Positive | 0.266667 |
6. Tweets Processing ¶
In this step, I wanted to clean up the tweets and then look at the most common words (adjectives) used by airline passengers to describe the airlines. I used RegEx and python's NLTK package to process the tweets. I created functions to remove stopwords, common words, punctuations, emojis, @mentions, hashtags, web adresses, digits, perform tokenization, lemmatization and then return only the adjectives using NLTK's POS-tagging.
Removing web addresses, tweet mentions, hastags, digits and emojis from tweets¶
#function to remove web addresses, @mentions, hashtags, digits and emojis from tweets
def get_valid_words(tweet):
tweet = tweet.lower() #set all words to lowercase
words = tweet.split() #return a list of words to be able to iterate through and exclude some invalid words
invalid_words = [word for word in words if re.search('@\S+|#\S+|http\S+|www\S+', word)] #create a list of all web addresses, @mentions, hashtags(#) in tweet
numbers = [word for word in words if re.search('\d+', word)] #create a list of all digits in tweet
valid_words = [word for word in words if word not in invalid_words and word not in numbers] #remove invalid words and numbers
valid_words = " ".join(valid_words) #join the resulting valid words with a space in between them
return demoji.replace(valid_words, "") #return the valid words with all emojis taken out
#trying out the get_valid_words function on a random tweet
tweet = "@ibomairlines pls come through for us🙏🙏🙏 https://t.co/t3oXuzpRTA"
get_valid_words(tweet)
'pls come through for us'
#applying the get_valid_words function to each entry in 'Text' column and storing the result in a new column 'Words'
df1['Words'] = df1['Text'].apply(get_valid_words)
df1.head()
Time_Created | Tweet_ID | Text | Likes | Retweets | Location | Airline | Sentiment | Sentiment_Score | Words | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2023-12-01 08:05:17 | 1730498287172493610 | @flyaero Please refund my money since August I... | 0 | 0 | --- | Aero Contractors | Negative | -0.420625 | please refund my money since august i applied ... |
1 | 2023-12-01 09:23:34 | 1730517986950127815 | @flyaero Good morning \n\nI booked a flight ye... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Neutral | 0.000000 | good morning i booked a flight yesterday from ... |
2 | 2023-12-01 09:24:44 | 1730518283634274364 | @flyaero \nI would greatly appreciate it if yo... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Positive | 0.266667 | i would greatly appreciate it if you could pro... |
3 | 2023-12-01 14:59:37 | 1730602557116985433 | @flyaero I booked for a flight and ticket was ... | 0 | 0 | Nigeria | Aero Contractors | Negative | -0.325000 | i booked for a flight and ticket was not issue... |
4 | 2023-12-01 15:45:11 | 1730614023979147744 | @TimsyMera @flyaero Did they sort this ? | 0 | 0 | On The Wheels!! | Aero Contractors | Neutral | 0.000000 | did they sort this ? |
Removing stopwords, unwanted words, alphabets, punctuations and returning only adjectives¶
#CREATING LIST OF STOPWORDS, ALPHABETS, PUNCTUATIONS, UNWANTED WORDS WHICH WILL BE TAKEN OUT FROM THE TWEETS
nltk_stopwords = list(stopwords.words('english')) #list of all nltk stopwords
alphabets = list(string.ascii_lowercase) #list of all english alphabets
codepoints = range(sys.maxunicode + 1) #create a sequence of integers from 0 to the maximum unicode code point
punctuations = [c for i in codepoints if category(c := chr(i)).startswith("P")] #list of all Unicode characters that belong to the “Punctuation” category
#here I defined some common words associated with airlines that would not be useful in my analysis.
common_words = ['flight', 'airline', 'airlines', 'plane', 'planes', 'airplane', 'airplanes', 'aero', 'air', 'peace', 'airpeace', 'dana', 'danaair', 'ibom',
'ibomair', 'max', 'maxair', 'united', 'unitednigeria', 'abuja', 'lagos', 'anambra', 'owerri', 'enugu', 'uyo','calabar', 'maiduguri', 'sky', 'travel',
'class', 'board', 'trip', 'arrive', 'ticket', 'fly', 'pay', 'check-in', 'checkin', 'counter', 'service', 'welcome', 'amp', 'crew', 'cabin', 'luggage', 'akwa', 'national',
'nigerian', 'una', 'ur', 'us', 'na', 'nnamdi', 'azikiwe', 'airport', 'murtala', 'mohammed', 'international', '..', '....']
# function created to tokenize the resulting words from get_valid_words function, exclude unwanted words (ie. stopwords,
# common words, alphabets, punctuations), lemmatize the words and then return only the adjectives using nltk's pos_tag.
def get_adjectives(tweet):
tweet = word_tokenize(tweet) #tokenize the tweet
stopwords = nltk_stopwords + common_words + alphabets + punctuations #define list of all unwanted words
tweet = [word for word in tweet if word not in stopwords] #exclude unwanted words
WNlemma = WordNetLemmatizer() #create an instance of a WordNet lemmatizer
lemmatized_words = [WNlemma.lemmatize(word) for word in tweet] #lemmatize resulting words in tweet
adjectives = [word for (word, tag) in pos_tag(lemmatized_words) if tag == "JJ"] #extract only the adjectives
return " ".join(adjectives)
#trying out the get_adjectives function on a randomly selected output of the get_valid_words function
text = df1.iloc[2, 9]
print('Text: {}\n'.format(text))
print('After applying get_adjectives function: {}'.format(get_adjectives(text)))
Text: i would greatly appreciate it if you could provide me with the necessary flight ticket and reference number as soon as possible or advise on the steps i should take to rectify this issue promptly. After applying get_adjectives function: necessary possible advise rectify
#applying the get_adjectives function to each set of words in 'Words' column and storing the result in a new column 'Adjectives'
df1['Adjectives'] = df1['Words'].apply(get_adjectives)
df1.head()
Time_Created | Tweet_ID | Text | Likes | Retweets | Location | Airline | Sentiment | Sentiment_Score | Words | Adjectives | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2023-12-01 08:05:17 | 1730498287172493610 | @flyaero Please refund my money since August I... | 0 | 0 | --- | Aero Contractors | Negative | -0.420625 | please refund my money since august i applied ... | |
1 | 2023-12-01 09:23:34 | 1730517986950127815 | @flyaero Good morning \n\nI booked a flight ye... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Neutral | 0.000000 | good morning i booked a flight yesterday from ... | good email spam necessary |
2 | 2023-12-01 09:24:44 | 1730518283634274364 | @flyaero \nI would greatly appreciate it if yo... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Positive | 0.266667 | i would greatly appreciate it if you could pro... | necessary possible advise rectify |
3 | 2023-12-01 14:59:37 | 1730602557116985433 | @flyaero I booked for a flight and ticket was ... | 0 | 0 | Nigeria | Aero Contractors | Negative | -0.325000 | i booked for a flight and ticket was not issue... | issued seek mean |
4 | 2023-12-01 15:45:11 | 1730614023979147744 | @TimsyMera @flyaero Did they sort this ? | 0 | 0 | On The Wheels!! | Aero Contractors | Neutral | 0.000000 | did they sort this ? |
7. Word Cloud¶
The aim for the word cloud was to show the adjectives used to describe the airlines. Since I had already extracted the adjectives from each tweet in the Tweets Processing step above, all I had to do was group the adjectives for each airline together. I created a function airline_adjectives to do this.
#function to return adjectives for any selected airline
def airline_adjectives(airline):
x = df1[df1['Airline'] == airline] #to select only entries for the specified airline
all_adjectives = x['Adjectives'].tolist() #return a list of all the adjectives used to describe the airline
y = " ".join(all_adjectives) #to join each element in the list of adjectives together, with a space in between
text = " ".join(y.split()) # to take care of the cases where there were no adjectives in the 'Adjectives' column,
# which resulted in multiple spaces in some positions in y
return text
#how the airline_adjectives function works
#using the airline_adjectives function to get all the adjectives used to describe Air Peace
airpeace = airline_adjectives('Air Peace')
airpeace
"global undue fraudulent monopoly serious crash lagos/abuja empty sure slash hard domestic waybill sabi find avoid warmth eh jor several parallel shld want various abt administratn diff survive salah break emailed guy ask last incentive wrong poor sha sufficient direct igboman gross fellow dear portal warri like festive light willing close avoid big small resemble disappointed hear pregnant ugo card delayed on-time give want want urgent terminal small rush know panic wa o. sudan free patriotic priority lag-owerri ademola sent january due cat kill esi nma ask alone getaway december perfect blend rome montreal whole leonardo say past sure bean schedule on-time awesome uk explain operated equal pas operated reliable omo interested africa great plastic full bad entire safe good happy new delayed political wicke-dness pm prior former presidential dr. chief ozekome dear next warri btw compensate excess former presidential dr. chief ozekome real safe southern cheap last cancelled delayed hard ️ smooth great reliable reliable prior scarce foreign citizen distress different free terrible pathetic tried several tryna reset invalid good sure fair enough big hausa emefele deep sure dear celebrate high attached safari abeg criminal lol green use sef 'it govt uk wrong wrong blame slave give usual cheaper come biafra dear stupid frame reasonable human painful foam cancel nice terminal lure private build ndi nextau short nkanu gov know scheme noooooo open profitable southern boko insensitive southern great main corrupt last shocked finished poor dry come difficult patronise elsewhere taught abi secondary lack hospital skill civil tribalistic special terrible single ethnic serious wicked possible adamawa full high portal festive military ndi festive basic economic odighi fair long southern busy lucrative high le point read wit ready correct last helpless direct foam almajiri entire comfortable reasonable northerner eh dear powerful nd lesson unaffordable expensive unaffordable slash slash able afford free force elementary simple simple cal big wont due .much subsidy ask nigfg know poor small ordinary compartment luxurious wicked basic economic wrong commercial expensive short empty empty listen shettima audio dry happy affordable dear festive xmas website foam sudan free patriotic clear fellow expensive enough funny direct fake sure route nah gala dey quick single bos poor monopoly unfortunate unprovoked big derogatory call whole understand primary route logic high peculiar wounded drive unacceptable clear ibadan special choose governor diri want see full active nonsense stop direct next know expose public publish use profitable high salary horrible want simple eastern emotional fake blatant simple outrageous good convey alternative necessary foolish funny ibadan last dey stead stop double little start certain high oga simple bulaba govts dont double ny hate married expensive able oo simple unify suspended tighten belt cough dey basic tribe usel flat social different big low le know social economic beautiful fable complex basic give simple virtual nin come easy open regional general delete subject logistical economic future busy real protect expensive kill mad tariff unfortunate understand unfortunate normal inaccurate real + available big private liar omoor hardd tribe full halt milked fg facet trek useless intolerant vile entire ah twice non festive festive hear quota wickedness steal scolded oga frequent ready naira point minute-local silly costly worthless muslim philanthropist full first-class normal express normal recent fishy big direct little high sure reckless inmany basic economic affect cancel corporate related ridiculous corporate primary frequent basic last hot kaduna want physical fingertips… available january dead delete ethnic unprovoked logic co expensive extra vip le proper animal equal equal useless narrative high plain wrong old confirm current on… deh u. huh different expensive fly daily full high normal fro willing come like long check counterpart.even want useless nsogbu oyibo typical them.can people.go elite young old tinubu suppose avin protect alternative license decide everyday free exempt rich english right available archive fct.pls free landroute hike due come trash london cotonou good okey ready southeast ara-nt bull-sh-it imagine nice sure come sure come booked different allotment particular allotted early le ready available fly want virtual artificial last next expensive expensive southern know simple republic expensive open ogo curse read necessary dec tough unreasonable come funny fight foreign domestic real economic real economic last sad involved single valid expensive important urgent functional engagement purpose expensive official last simple high standard fellow busy bigjoe major major transport tribalistic super rich loud hustle high insult backward common irrespective common main mean understand foam big applied private overall initial total hard high basic funny contrary pas simple simple fare dumb simple wrong olodo miss secondary dey different texas simple special ordinary expensive expensive opposite fair domestic active so-called allow poor reckless irresponsible festive rubbish tactless important vacuous embarrassed poor upbringing teach speak public online small false last papa whole flew le similar lagos-asaba insensitive unalive narrative arik hilarious high route fare dey rubbish cheap particular high low next unnecessary elementary willing need write ncaa high second high jan. jan ridiculous wednesday thursday friday se significant ridiculous similar tell bitter ibadan smooth card excess burden basic contrary expensive understand new next private standard level fg private bros. fro influx ngo simple yahoo handle last last safe safe coz give excessive ask pure last specific unusual resale daily bad sure due regular ng explain harsh hurtful economic real ready rubbish standard hate certain tribe starved tribalistic reference standard standard fair standard steady aware gotten total sweet final quick expensive usual full oversold happy new burst website keep photo follow sure particular joke next poor so-called enjoyed link free give nkan late oversold pen type common thwart several fake operational prior online change bad bad crazy use corporate shit extra refunded extra lori fair empty festive last usual southeast big difficult basic read simple late last extra missed checked ready young fair-weather lol lol stole give important omoooo lagos- last naira serious favorite vip left whole it crazy attempted next delayed private lot uk happen shenanigan use available vex rubbish lawless strong foreign speak ethiopian true ridiculous suppose come poor poor useless able sirikas phantom confirmed seat pas ethiopian stop bullish heavy bad come govt bad agree citizen legal no-show fine wrong missed similar close tinubu heavy free huge amaze = subject prior shock….just nonchalant onwa official next cancel enough pas simple talk safe big asaba agree late good normal true unfortunate unacceptable social re-word incomprehensible likely oversold present logic…just video location- pas second late empty potential folk small next sorry exclusive whip sentiment unpleasant wish last lousy proper english foreign write last lose closed reschedule ot dey small serious big debatable lousy poor angry right incendiary bá apology stay finally… foreign wear super clear festive disappointment standard uae thr testified empathetic checked next ask price-basic speak make-up low undercharge southern wrong enough low real reasonable oversold meant wrong cool wan hear proper comfortable want ask criminal embraer festive avoid exorbitant stressful incompetent thrive open preferred important necessary difficult heavy fuck judicial lazy stupid similar last useless last relieved calm normal insurgency reasonable subsidize fee federal asaba lt fed fee recent know onitsha federal last told bus smdh english positive positive negative rubbish allow oversold leprosy whole magic click clear last several eventual almajiri next sure global lmaoooo small next angry sue dead free receive reflect wrong serious tear tear isoko certain pas different tinubu weather confirm antigua + last irresponsible nna ehn sure useless useless fake due unexplainable operational valid imagine please critical usual several terrible right tribe grown belongs right know stop pas proof screen right chose abusive arik last common oga sure ment shitty irresponsible stop functional bad functional proper govt ei-gvw eznis sistership ei-uln true severe new accept portal physical last tried entire past terrible low swear gentleman bad esp rouge disappointment operational poor year…i last important past criminal certain common hr real dear progressive spend havoc chic pepper lol free last gree short worthy new old clear friday warri enough guess foreign available hear mental important true mess quick criticize fine touch everyday different survive late entire reliable single late funny mental comment early give lesson perch notorious criminal gree weigh next vip experienced early abeg various sure website foreign mma next next unprofessional pls miss late clear good o'clock big hilarious original madness sha dibia last common sorry happen economical expensive o'level secondary little interesting re-diverted low grateful usual limited top dubious outrageous true mr. stop huge told email london impact clear useless wow early open dubai holiday omo warri wan right expensive tried january bad avaliable dey normal rubbish screen issued pas gate saw expensive last true africa atp foolish unnecessary nice festive risky quote remarkable chinua achebe enough wait port busy clear quiet empty suppose high able proof absolute past unimaginable sudan free patriotic priority lag-owerri lag exist make become pregnant ugo big qatar next mean southern mr. difficult hide sure visa miserable impoverished finish finish half sympathy unsuspecting screenshot cork-n-bull snap pas engage half sympathy unsuspecting public interrogate receipt needful pas bp kinda attract public incompetent peak legal funny overdue urgent hard patriotic lag-owerri please abv-owerri little attract fdi sanofi etc tag employed single rich poor uphold common good rich poor scared sleep clear low undue girl rapist particular several sleep several last unsolicited good lol relevant guess heard piss vn girl refused fair feeling betrayed hard-earned unfair unfair one-time everyday dey small needful open route upgrade chicken thigh dry fair bad shitty please write impossible hold opt itinerary needful sad needful terrible terrible terrible minute closed economic unfair bird normal ethic individual serious unfair guy lol dead pen evil guy late shitty nonsense big big negative evil deeper good negative route bad impossible expensive dear uthman late deny negative lose good prestigious global mean ready clear bat economic negative negative complete huge unacceptable understand asaba scheduled lead various negative criminal main senior dear god dear god notorious send send new free last atiku lag-owerri min lag-maiduguri criminal tactic new april kano-jidda foreign green unconducive govt luxurious infact luxurious upgrade omo checked-in excess receipt alone dey quote rubbish conscious deliberate attack logical usa told wickedness london lagos/ ilorin-anambra/enugu alive expensive expensive expensive meanwhile favorite thank okonkwo bring small colonial subservient red contraband booked dear please free rural rich agberos poor poor rich innoson orange naira innoson aircraft sleep gree expensive thst innoson normal incurable able booked alen fulani parentage political secret watch md fantastic central sure right great nice -sir last tax/vat unbelievable nose next common good give public collect worrisome high foot enough outside last omo shap useless new important selfish little mess whole high azman innocent homegrown grow embrace great public fair alternative killed demonic ibeto critical monopoly critical useless direct website bad refund spend excess due incessant full yeye poor early wickedness lol sad dear non viable kano fine viable due private finish sent private whole covered simple whole eastern last terrible eastern able ex christian islamic foolish islamic dear recent legal poor stressful expensive fix private whatnot nasarawa dis high explain rational outrageous fair general wicked useless global treat urgent good complain corper friday ghangzhou chat/call thursday global delta south-east p.h lagos-uyo pre-yuletide eastern high ebonyi friday thursday asian beat luggage forgotten ready dey bn-abj steady barbados ni ni website green normal good strong know young monster complete complete complete complete complete ethnic public good closed official urgency pinch super treat unfavourable true swear alternative chuck inaugural obiora stop simple expensive sensitive blame new poor new awesome accra ludicrous solid popular huge postam guy appropriate criminal… sad popular huge wish akpi leave bad wish judgement past mad good ready zoo gbese popular aga onu high ask long ndi send sudden fair ndị early cheap v.high co please bad dis festive dre le stable hard happen route huge standard fair sudan free pure nig find simple favour exorbitant flew fifteen wrong little high pick know direct thesame ready unreliable pathetic personal unpleased possible economic low route uncomfortable kept stupid enough escalate assessment amidst sudan free festive fair checked ok screenshot tired insulting basic boycott send high unaffordable needful corrupted pray adopt oké sigh last available available peak ada yo insist give negative available empty easy start cheap possible stop ndi oso tribe everyday different told eastern low full ndigbo donjazzy zambian christian normal christian yamal major artistic started yamal major artistic jonathan major artistic pullisic black important mbappe|wizkid|first twe|airpeace|rivers|sule|christian evil mean federal main lexus negotiable foreign isime newcastle win sure time|wike|twe twe|airpeace|rivers|sule|christian doyin | federal main rattle viral g-wagon umuaba title asari bad isime| free leao| visan take hear hard unity general individualistic wike christian supreme ellen separate fix entire clean green private public i. right buz true last major christian checked finish give true quiet want asap….advocate fulani hard use simple ansmbra le profitable heavy likely available late abv—los finish sha clear azman useless naira marley public ojo sue nollywood private blame high naira marley public ojo sue termed wicked shege re-scheduled stuck complex judiciary new airlines- wednesday benin sad new capt gear able want kaduna fg public judiciary okamma umunna central ojo|vaseline|manchester united|reno|nancy|ekiti|tems|juju former click judiciary annual beneficiary judiciary fg unknown click judiciary united|reno|nancy|ekiti|tems|juju united|reno|nancy|ekiti|tems|juju united|reno|nancy|ekiti|tems|juju united|reno|nancy|ekiti|tems|juju strategic key effective idris click judiciary truthful narrative frequent repentant open judiciary oriental ...... win sure time|wike|twe twe|airpeace|rivers|sule|christian noise major rich involvement judiciary ijaw overtime high magana unidentified assailant judiciary clipped mean mischievous muric judiciary ijaw natural ijaw mouthpiece important noticed available second last available small full stricken lazy fellow black tough click judiciary ijaw meduguri unfair vast indian narcotic click judiciary ijaw true happy abe start administrative unconsciousness judiciary ijaw pple good chief drive so… explanation several free invest marriage ordeal iyabo chinese mainland single-aisle potential kong good accra disgusting ceo unfair comparison eastern asari imo new judiciary ijaw ceo unfair comparison eastern imo sincerely appreciate difficult agụ //'ed puma more- judiciary ijaw nationwide sweet iyabo naira christian hustle pj ai judiciary ijaw supreme click judiciary ijaw age… proper wish click judiciary ijaw .traffic high yuletide yuletide early come prank airpeace| dicey regular empty full turn kwa unu empty ndi-igbo good monrovia left extra bad high large public possible different sure govt win sure time|wike|twe twe|airpeace|rivers|sule|christian bad want little oo pathetic typical calm africa sudan patriotic isa dey full damsel affected okonkwo musician actual good global locked crumble willing apology lot shit multiple positive lowkey pray watin make bad bad hot good mutual federal high sad god pray govt hungry fow bad de ifechukwu giant great safe coscharis false popular original flew terminal expensive stockfish yam ukpaka ofe unruly rude uncultured unable new pls send unable new pls send different onyeama proper fatal useless accra poor last lawlessness happy recent unforgivable wrong se fubara supreme last late weite sha unserviceable explained want bad uk useless jumpstarted insensitive exorbitant disrespectful flew hope avoid co sure alternative expect high giveaway ear sabi valid faulty responsible likely want public technical unable become airborne able sympathy imagine faulty next english serviceable next hot nice nah sabi enough separate meal next hassle unserviceable happy big inevitable god cheap poor scene usa/canada edikainkong garri achi/oha/draw shege painful non notorious nonsense foolish like private next excess exam november fix consume ndi public economical crazy bad proud key safe africa sudan ridiculous good real good happy upgraded next come familiar miss appointment crumble missed next great know lie gree god good mad respect frequent wrong quick simple regulatory nodey eh asaba crumble sorry available sent unfortunate mid careful bad hope true usd worthless lucky last disappointed frustrated sudden crucial first-time satisfactory overall avoided execute flew shege stupid terrible worst poor rude unreliable ibadan proper timely fake last jo tried normal large commercial clean successful avoid single sorry poor sweet punish safe oow infact mid-flight mid-flight need normal certain azman bus black everyday pet regional possible outrageous able advice pnr true timely regrettable unexpected stupid wicked till sentimental destination fault worth int nail abv-phc original departed unlimited rush abj dey nah mental ask dem light unexpected pa-nful green irresponsible saturday uk kano african aeroplane evacuate usel€ hateful envious ungrateful global honourable elite serious orsumoghu ochiagha second orsumoghu past due unscheduled available uk dae u. benin terrible shame give normal foreign domestic miserable dear due unscheduled terrible obvious naaa possible ph mean viable le surprised que abysmal livid mana read single right enough employ obvious whole mad unserious several trolley suitcase separate last alive beautiful remarkable wish mma disgraceful crumble crumble head bag fucking told tired last friday waited abeg double lock unstable new horizon bold testament ehn chaotic shege hurry present general entered sure refund sick pele demonic unexplainable true justifiable empty obvious region-specific high unfortunate empty disagree empty possible online platform high bah easterne web able happening high empty guy plight harsh refinement bring spare attend unfair english rubbish noticed rubbish ethnic ethnic reasonable reasonable cheaper nonsense teach minimum mum give address guo ethnic bring address unprofessional federal high criminal fellow enough chase future low god divine uncanny several costly true mumu fast big integrated emzor orange numerous due unscheduled true average understand xmas different mkt true complacent high several stupid sensible wise umm…so able green know next direct senegal china whole avail adjust crumble becoz useless hashtag emotional direct senegal china ghana direct senegal china ghana direct senegal china ghana exorbitant etc flew lagos-asaba great funny ridiculous rescheduled whole prey inclusive medical open christmas critic expensive lagos-owerri nin stay avoid understanding nonsense right funny propeller contribution ukarine dear ori useless normal gotten useless ready operate serious crumble ok disrupt irresponsible pm rich expensive full mess whole major instant gomorrah vatican arik reschedule able willing asaba stop keep daily update crumble crumble discriminatory southeastern rumor southeastern exorbitant tight arrival loool idc real scheduled hi contact high domestic loaded different till able remove ready alert huge fanning jealousy criminal january shame devil threw federal interstate new free collect beg av s.a wud commercial free january nationwide come kd arabian saturday upto rubbish tinubu smooth holiday interstate free january inter-ministerial presidential federal successful special presidential initiative luxurious route unreliable y'all january green costly though mass willing public public private private fct bad africa private chisco federal eastern dear arrival private xenophobic free sa free true false last tear next innocent unexposed fuel innocent unexposed fuel innocent unexposed fuel innocent unexposed fuel innocent unexposed fuel in-flight high jollof turkish european in-flight high needful dey stop deterrent mean use. leave hello wish thursday ig follow ooh rich accolade consistent national peaceful grateful exceptional early refund hear lose sit-at-home surgery funny stick desperate nurtw god god wanted happy general civil najomo wish general public prosperous new guy gig useless young right ethiopian star young right ethiopian star lmfaooooooooo horrible uk remarkable new third second thank honourable granted want direct nigerian-british direct nigerian-british thank touch early festive guo major behind.many address complaint unacceptable sick extensive numerous scheduled important personal undue act unacceptable numerous individual extreme essential key furthermore proactive incident pattern structured distress affected unacceptable immediate necessary similar undeserved receive unprofessional key hav sick resume wish hard high high festive chose miss worst resume direct guangzhou-china dominate resume in-flight diary black african ordeal italian apology wrong naija dey strange whole rich give allow lil separate green attendant reliable new accra-china strong african resume legxus schedule mouth legit cheap federal lion igbo yariba cheap american military ground successful ouch rubbish false talent nurtw palliative exist october open handle stole stole yearly apc joy dey private second prepared disclaimer throw uba guy technical sleep technical uba mum manner angama real cut lowkey stray second old nervous illegal delta idk tfare niger shaa useless stole overall nice email insensitive pls alternative mean arthur eze wealthy whole rich young clear open funny disgraceful innoson direct busy great guess available madness slight dear guy ya route want direct ndi functional straight ethiopian weekly parent ahoukd dubai appointment long funny concerned sane loose trust cool enugu-london daily enugu-beijing enugu-jborg enugu-ny nice obidient need hr le ethiopia qatar delay ooo angry akanu open competitive ni right green london new long respect delayed min tacky uncool particular last read thread understand uk ethiopian advantage direct ethiopian let lose se unfortunate direct direct turkish operational wrap due unscheduled naija hot great strong social big ready initial extreme january abv-kan-jed ethiopia qatar guy unprofessional suppose glo high dear forgive direct uae right uk route london jfk big direct remarkable entrust professional sudan obijackson useless rewrite ask experienced august ta next okay bad little fleet new pro thank wish route fit lai lag cover"
#using the airline_adjectives function to get all the adjectives used to describe Ibom Air
ibomair = airline_adjectives('Ibom Air')
ibomair
'official big endless important deadline incur additional pick free reveal exclusive great tour nice wide continuous several blue nice november impressive summary on-time send great dm delayed high delayed public top reliable please serious former right sef know delay exceptional fifteen itinerary similar email right perfect remo back robert sikiru bombastic top seaport intelligent threw new new smile delightful new happy new happy new exciting hear yeye ak tho ak multiple nso ntom happy new ojota warri lucrative easy exclusive new concert great grand obo memorable new concert new regular new new concert sharp want lucky special timeless concert okay easy exclusive new concert great top new concert sure good lucky new know possible last inspite nice dey concert regional vegetable see swear crazy differential essien udim rich udim essien udim rich udim several urgent operate different fight essien udim rich udim inefficient rescheduled ready mista xto december udoma unacceptable christmas ikpa park december nnung disappointing nice christmas ikpa udoma favorite miss christmas great amazing wait top pull usung love light grand touch seamless etim ekpo rich young etim ekpo rich memorable etim ekpo park wonderful vibrant rich etim ekpo online refund young thursday successful direct festive usung christmas top refund hard visible strange celebrate grand various unforgettable fantastic udoma ak fun-filled favorite good good reachable chat udoma unforgettable true abeg bad ready bright anticipated unplugged commitment uplifting immediate lasting thursday giant lineup a-list festival un durant irresponsible several different cancelled happy civil incredible incredible sky unite happy several different send unplugged next light grand udoma civil favorite imiyem uwam light celebrate top ikpa mista awesome unplugged pm free a-list inyainya civil unplugged pm free a-list ikpa unreachable fair new touch new empathy sue accountable recourse irresponsive rude lagos-abuja leg foodstuff tonight good civil special arrival wait new sleek new factory luxurious interior new touch momentous state-of-the-art luxurious interior align runway final remarkable new new touch momentous cutting-edge new touchdown new new touchdown operational new new new new continued happy band-new late official big new significant aircraft top-notch modern lavish future sleek luxurious till brag different hot pas pas exam welcome new premium ten new remarkable new abj cal impressive reply/quote different christmas ikot brand-new official live-streamed late key military square come- top-notch effortless arrival new esteemed port arrival new esteemed new any… formal new airbus- effective commencement commercial le additional fleet new total fleet toast preferred bright november worthy immediate new lady various humble subnational infrastructural celebrate worthy usual flash wishful started commercial great bear private spectacular phenomenal major commercial afoot route central african previous financial immediate past corinthian critical watered increase first- arrival brand-new remarkable unprecedented big formal new airbus- new carnival governor naked sorrow in- welcome regional new new testament past mr. udom emmanuel leading dedicated inaugural honorable san new new excellent happy new touch friday new big great know nice amazing akwaibom long great christmas next ready level unreachable entire visionary visible profitable transferable former emmanuel private new bos ten anointment subsequent wish good bold akwa- global entire akwa-ibom busy godfather akwa-ibom good rich good poor initiative proud next enough new slow outcome new long excellent dispose thank good sir great great built stop middle pls good good good great great uwc selective question aviation current want map reliable functional functional human fix easy everyday ibaka deep open basic major wrong new ment basic true dear new big urgent untold wrong serve good good nice tough full full elapsed drag social full unprofessional direct opposite airborne nice good wow good next good cbq new grand new launch hard take poor offline affordable see affordable citizen public expensive individual affordable possible need true nice affordable nice new touch good good sure busy uśeless big close sm patriotic needful refund guy needful westerly foot dear email urgent reduced waited patriotic unusual prompt political wet foreign wrong ncaa cpd public sort blocked unblock los missed late christmas solid prior text whole non chalant irresponsible unreliable easy difficult unacceptable checked unprofessional unreliable good disappointed guy consider told piss tuesday manageable thursday last true sending sad good disappointed successful pitfall hello laptop urgent important told come macbook possible routine different corrupt nonsense abj criminal next serious commercial good urban tedious late waited good several different distant neega good strong regional etc economic easy open happy ibomair/road guy wednesday lucky due rare oh next thank indigenous honorable attached mean le chisco mot bad scheduled dear luggage good old experienced front open mid-flight prevent potential smile see careful allow critical pas rough notify conscious lady needful friend great mental hearing please good preferred oh furious disappointed prior nonchalant unacceptable immediate assist needful uptil bad sorry sort non-voluntary needful hard salt tweet ready ibadan several arik top november long obvious website top honourable make zoo needful port print stole comment stole social unnecessary brief sincere let bad refund collect fight fair ordeal various head right poor common sorry prompt goose disappointed regional ready right accra good poor last individual ki massive proud usung big successful nyin prompt available mbo rich mbo extra extra crumble difficult funny unexplainable patriotic unreliable keep serious warn available nice african festive additional recent cancel nonsense wrong technical a/c uncomfortable fair new ine confirm ok oo bus good fit full wan delta refund particular follow rubbish fraudulent great route port refund old verifiable rhetorical dangote old akwa-ibom good take next convinced lease excess lease wet mean come dry apologize provide light working needful alternative big present modernity oron historic edimmmm needful reasonable sweet disappointment stranded big happy wrong bean good happy high trust sad prompt text notify commend oga multiple clear regular up… different refund refund november arrived wait possible january whopping african free easy unfortunate sad off-board half affordable nsikan love hard last bad udung uko lga park.they unique good interested functional route fleet keep abba safe good africa uncommon safe unfulfilled direct uk usa good'
Creating the Word Cloud¶
After getting the adjectives for the airlines, it was time to plot the word cloud. Since this analysis was about airlines, I wanted to create the word cloud to have an aeroplane shape outline. I acheived this by superimposing the word cloud unto a mask of an aeroplane shape. I imported the mask (as an image), converted it into a numpy array, and then using the WordCloud package, superimposed the words unto the mask.
I created functions to define the colormaps for the word cloud (ie to specify the colours to be used). I also created a function that would plot the word cloud for any specified airline.
image = np.array(Image.open(r"C:\Users\ogech\Documents\aeroplane.png")) #import the image mask
fig = plt.figure(figsize=(9, 13)) #specify the figure size
# show the image
plt.imshow(image, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis('off')
plt.show()
#functions to specify the colours to be used
def blue_color_func(word, font_size, position, orientation, random_state=None, **kwargs):
return("hsl(240, 100%, {}%)".format(random.randint(35, 50))) #return different shades of blue colour
def green_color_func(word, font_size, position, orientation, random_state=None, **kwargs):
return("hsl(120, 100%, {}%)".format(random.randint(20, 35))) #return different shades of green colour
#function to plot the word cloud for any airline option
def plot_word_cloud(airline_name, color, colorx):
text_data = airline_adjectives(airline_name) #genetate the adjectives for the specified airline using the airline_adjectives function already created
x = WordCloud(background_color = "white", mask=image, collocations=False) #instantiate a word cloud object
x = x.generate(text_data) #generate the word cloud
x.recolor(color_func = color) #specify the colour(s)
fig = plt.figure(figsize=(9, 13)) #set up the word cloud figure
#set and design the title
plt.text(0.30, 1.05, 'Word Cloud for most common', color='black', fontsize=12, transform=plt.gca().transAxes)
plt.text(0.66, 1.05, 'words', color=colorx, fontsize=12, transform=plt.gca().transAxes)
plt.text(0.74, 1.05, 'used to describe' + ' ' + airline_name, color='black', fontsize=12, transform=plt.gca().transAxes)
#display the word cloud
plt.imshow(x, interpolation='bilinear')
plt.axis('off')
plt.show()
#using the plot_word_cloud function to plot word cloud of the most common words used to describe Air Peace and Ibom Air
plot_word_cloud('Air Peace', blue_color_func, 'blue')
plot_word_cloud('Ibom Air', green_color_func, 'green')
Most Common Words¶
The word cloud for Air Peace shows that some of the most common words used to describe the airline are: last, high, expensive, direct, good. For Ibom Air, we have: new, good, great, nice, top.
To validate this, I created a function top_5 to return (in descending order) the top 5 words used to describe any airline option.
#function to return the top 5 words (in descending order) used to describe a selected airline
def top_5(airline):
text = airline_adjectives(airline) #use the airline_adjectives function previously created to return the adjectives
dist = FreqDist(word_tokenize(text)) # returns a dictionary containing the all the words in 'text' and their frequency
# distribution. Note that starting from NLTK version 3.0.0, FreqDist now returns the
# words in descending order of frequency.
top_5_words = list(dist)[:5] #returns the 5 most common words
return (', '.join(top_5_words)) #returns each item in the top_5_words list separated by a comma and then a space
#using the top_5 function to return the 5 most common words used to describe Air Peace and Ibom Air
print('The top 5 words used to describe Air Peace are: {}.'.format(top_5('Air Peace')))
print('The top 5 words used to describe Ibom Air are: {}.'.format(top_5('Ibom Air')))
The top 5 words used to describe Air Peace are: last, high, expensive, next, bad. The top 5 words used to describe Ibom Air are: new, good, great, nice, happy.
At this point, I was done with my analysis and it was time to store the resulting dataset as csv for use in creating a Tableau dashboard. Preview the head of my final dataset before export.
df1.head()
Time_Created | Tweet_ID | Text | Likes | Retweets | Location | Airline | Sentiment | Sentiment_Score | Words | Adjectives | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2023-12-01 08:05:17 | 1730498287172493610 | @flyaero Please refund my money since August I... | 0 | 0 | --- | Aero Contractors | Negative | -0.420625 | please refund my money since august i applied ... | |
1 | 2023-12-01 09:23:34 | 1730517986950127815 | @flyaero Good morning \n\nI booked a flight ye... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Neutral | 0.000000 | good morning i booked a flight yesterday from ... | good email spam necessary |
2 | 2023-12-01 09:24:44 | 1730518283634274364 | @flyaero \nI would greatly appreciate it if yo... | 0 | 0 | Abuja, Nigeria | Aero Contractors | Positive | 0.266667 | i would greatly appreciate it if you could pro... | necessary possible advise rectify |
3 | 2023-12-01 14:59:37 | 1730602557116985433 | @flyaero I booked for a flight and ticket was ... | 0 | 0 | Nigeria | Aero Contractors | Negative | -0.325000 | i booked for a flight and ticket was not issue... | issued seek mean |
4 | 2023-12-01 15:45:11 | 1730614023979147744 | @TimsyMera @flyaero Did they sort this ? | 0 | 0 | On The Wheels!! | Aero Contractors | Neutral | 0.000000 | did they sort this ? |
#save the cleaned dataset
df1.to_csv("Nigerian_Airline_Passengers'_Tweets.csv", index = False)