Step 0: Introduction

In this article, we will work with Twitter, which is quite popular and has a lot of users. By using the Twitter API and writing our code in Python language, we will collect data from Twitter and try to process this data according to my story. Let’s tell the story and go to the implementation section 🙌🏽

Step 1: Collect data from the Twitter API

We will use Twitter (Application Programming Interface) to get data from Twitter. First of all, we need to go to a new app creation page and get our own key and secret values. When we connect to Twitter with the API we will use the values ​​we have.

import requests
from requests_oauthlib import OAuth1
import oauth2

In the above code block, you should write the values ​​that Twitter gave you to the places specified as XXX. Here we make use of Python’s request and OAuth2 libraries to link to Twitter.

import pymongo
from pymongo import MongoClient

In this work we used MongoDB to store the data (you can use any database if you want). Above we see the settings we need to connect to MongoDB. We created ourselves a new database and a new collection (I will not go into more technical details but if you have any questions you can write in the comment section and then I will try to help as much as possible).

import json
import time

First we pull out the follower id of a screen_name. The code you saw above is written as a recursive method, but you can write it with a different (non-recursive) function. Perhaps the most challenging part for us is Twitter’s limitations. Unfortunately, it does not give you the right to take as much data as you want. You can check which restrictions apply to which request. For example, you are allowed to make 15 requests for follower ids at each 15 minutes.

ids = followers_collection.aggregate([{‘$unwind’: ‘$ids’}])
b = 0
for id in ids:
b += 1
del id[‘_id’]

The data we send to MongoDB may not always be saved in the format we want. In the above code, we are performing the unwind operation. Let’s say we have 50 objects in a bag, we arrange it in 50 bags as objects, and save them in a new collection.

def get_profile(a):
for i in range(a, a + 299): # rate limit: max 300 request
# to get max 100 user_ids (user_ids should be form of a comma
seperated list)
userid = followers_ids_collection.find().skip(i*100).limit(100)

str_user = ‘,’.join([str(int_id[‘ids’]) for int_id in userid])

request = ‘{url}{user_id}’.format(url=”", user_id=str_user)
response, data = client.request(request)
profile = json.loads(data)

We will then send a new request to Twitter to get their profile from these id’s. Here again due to the limitation of Twitter API, we can get max. 30 000 data at every 15 minutes and we continue this process by pulling data.

def get_friends():
for i in followers_profile_collection.find()[147:228]:
next_cursor = -1
friends = []
userid = i[‘id’]

while True:
url = “" + str(next_cursor) + “&user_id=” + str(userid)
response, data = client.request(url)
except TimeoutError:
print “TimeoutError, waiting 5 seconds to retry…”
except Exception as e:
print “Some other exception happened. “, e
print “Waiting 30 seconds to retry…”

if response.status == 200:
parsed_data = json.loads(data)
friends = friends + list(parsed_data[‘ids’])
next_cursor = parsed_data.get(‘next_cursor’)
if next_cursor == 0:
elif response.status == 429:
print float(response[‘x-rate-limit-reset’]) — time.time()
time.sleep(max(float(response[‘x-rate-limit-reset’]) — time.time(), 0))
elif response.status >= 400 and response.status < 500:
print ‘User %s is skipped because of status %d’ % (str(userid), response.status)
print ‘Got status: %d trying again…’ % response.status

Once you reached the profiles of the followers, we will draw the information of friends of followers’. You can see many lines of code for collecting friend id, but do not be afraid 😊 In fact, most of the code was written to take control of possible errors such as you can disconnect your internet connection, encounter a protected user so not be able to capture his friends, and so on. If you do not add any control mechanisms in such cases, unfortunately your code will stop working and you will encounter long error messages.

This is the end of our data extraction process. Let’s continue with the analysis part 🎈

Step 2: Processing data to analyze relationships between Twitter accounts

To process our data we first extract it from MongoDB.

import pandas as pd
total_data = pd.DataFrame(list(friend_of_users_collection.find()))

As you can see above, we can get the full value of the collection with find(). Thanks to Python’s stunning pandas library, we can see this recorded data more readable.

Image for post
Image for post
user-friend list

It’s our turn to do our analysis. If we have friends in our user list, we will get them and we will not use the rest for this analysis.

number_of_user_ids = len(total_data_user_ids)

If we run the above code, we will still have a user-friend relationship, but we will not be interested in friends that are not in our user list. Now we can prepare the input for the user-friend relationship we want to visualize. We will use Gephi to visualize our results.

Image for post
Image for post
source-target data

By performing similar operations in the data extraction phase, we can view the user and friend lists in a table by using pandas to access the profile information.

Image for post
Image for post
source-target relation

It shows that there is a relationship (edge) between the IDs data in which we have created as above.

In the next tutorial we will discuss how to organize this output in accordance with Gephi’s format and visualize it. I think the most crucial part will be the visualization phase 🎉

All the best 😊

Written by

Ph.D. Cand. in CmpE @Boğaziçi University. #ai #privacy #uncertainty #ml #dl #running #cycling #she/her

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store