Something Fun - Python -> Twitter - Confessions of a Data Guy

I recently did a little project to find out what makes a company tick, using Python and the Twitter API. It has to be done quickly, in like a day, and didn’t need to be overly complicated.

Below is what I came up with, the idea being just use Python to read tweets about a something or someone, store them in a text file, then just do a Word Cloud on them to see what popped up. Nothing fancy and nothing really production ready.

import twitter
import json
import csv
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
 
 
csv_file = 'C:\\Users\\Daniel.Beach\\AppData\\Local\\Programs\\Python\\Python35-32\\tweets.csv'
key_file = 'C:\\Users\\Daniel.Beach\\AppData\\Local\\Programs\\Python\\Python35-32\\keys.json'
 
def loadSuperSecretKeys(key_file):
	file = key_file
	with open(file, 'r') as f:
		superSecrets = json.load(f)
	return superSecrets
 
def loadTwit(superSecrets):
	api = twitter.Api(consumer_key = superSecrets["consumer_key"],
		consumer_secret = superSecrets["consumer_secret"],
		access_token_key = superSecrets["access_token_key"],
		access_token_secret = superSecrets["access_token_secret"])
	return api
 
def searchTwit(api): #get tweets from past 7 days. You get what you pay for -&gt; nothing.
	search = api.GetSearch("GitHub")
	for tweet in search:
		yield {tweet.id : tweet.text.encode('utf-8')}
		print(tweet.id)
 
def saveTwits(dict,csv_file,list):
	with open(csv_file, 'a', encoding='utf8') as csv:
		for k,v in dict.items():
			if k not in list:
				csv.write('"' + str(k) + '","' + str(v).replace(',','') + '"\n')
 
def readTwitIDs(csv_file):
	idList = []
	df = pd.read_csv(csv_file)
	for column in df.iterrows():
		idList.append(column[1][0])
	return idList
 
def analyzeTwits(csv_file):
	text = ''
	df = pd.read_csv(csv_file)
	for column in df.iterrows():                                                                            
		text = text + ' ' + str(column[1][1])
	return text.replace('.','').replace(',','').replace('#','').replace('"','').replace('@','').replace('!','').replace(':','').replace('?','').replace("b'rt",' ').replace(' to ','').replace(' a ' ,' ').replace(' and ','').replace(' as ','').replace(' you ','').replace(' for ','')
 
def countWords(text):
	wordCounts = {}
	text = text.lower()
	words = text.split()
	for word in words:
		if word in wordCounts:
			wordCounts[word] += 1
		else:
			wordCounts[word] = 1
	frame = pd.DataFrame(list(wordCounts.items()),columns=['word','count'])
	frame = frame.sort('count', ascending=False).head(15)
	print(frame)
	stopwords = set(STOPWORDS)
	wordcloud = WordCloud(
                          background_color='white',
						  stopwords=stopwords,
                          max_words=10,
                          max_font_size=40, 
                          random_state=42
                         ).generate(str(frame['word']))
 
	print(wordcloud)
	fig = plt.figure(1)
	plt.imshow(wordcloud)
	plt.axis('off')
	plt.show()
 
def main():
		idList = readTwitIDs(csv_file)
		superSecrets = loadSuperSecretKeys(key_file)
		api = loadTwit(superSecrets)
		tweeties = searchTwit(api)
		for t in tweeties:
			saveTwits(t,csv_file,idList)
		text = analyzeTwits(csv_file)
		countWords(text)
 
if __name__ == '__main__': #&lt;-allows import of file in other projects without executing code.
	main()

I was surprised with how easy it was to work with the pip installed twitter wrapper, made things a breeze. Pandas also makes a great no SQL type option so you don’t have to spend time setting up DDL and DML. There are obviously some things that need improving, some error handling, better way to remove words etc. But it’s a fun little project and a great way to find something out quickly about whatever you want. Below is the output from searching GitHub. Kinda funny that Microsoft shows up, we all know why.

All the code is out on Github. Something fun!

Something Fun – Python -> Twitter

Interesting links

Pages

Categories

Archive