Step 0: Introduction

In this article, we will work with Twitter, which is quite popular and has a lot of users. By using the Twitter API and writing our code in Python language, we will collect data from Twitter and try to process this data according to my story. Let’s tell the story and go to the implementation section 🙌🏽

Step 1: Collect data from the Twitter API

We will use Twitter (Application Programming Interface) to get data from Twitter. First of all, we need to go to a new app creation page and get our own key and secret values. When we connect to Twitter with the API we will use the values ​​we have.

consumer = oauth2.Consumer(key=CONSUMER_KEY,secret=CONSUMER_SECRET)
access_token = oauth2.Token(key=ACCESS_TOKEN,secret=ACCESS_TOKEN_SECRET)

In the above code block, you should write the values ​​that Twitter gave you to the places specified as XXX. Here we make use of Python’s request and OAuth2 libraries to link to Twitter.

client = oauth2.Client(consumer,access_token)# create a connection with Mongoclient
mongo_client = MongoClient(‘localhost’, 27017)
db = mongo_client[‘new_db’] # create a database
new_collection = db.new_collection # create a collection

In this work we used MongoDB to store the data (you can use any database if you want). Above we see the settings we need to connect to MongoDB. We created ourselves a new database and a new collection (I will not go into more technical details but if you have any questions you can write in the comment section and then I will try to help as much as possible).

QUERY = ‘openmaker’ENDPOINTS = {
‘followers’: ‘',
next_cursor = -1def get_followers(username, cursor=-1, nested_count=0):
if nested_count > 14: # rate limit: max 15 requests in 15 mins
return []
params = {
‘screen_name’: username,
‘cursor’: cursor
response = requests.get(ENDPOINTS[‘followers’], auth=auth, params=params)

data = response.json()
followers_collection.insert_one(data)next_cursor = data[‘next_cursor’]

return data[‘ids’] + get_followers(username, data[‘next_cursor’], nested_count+1)
then = time.time()if __name__ == ‘__main__’:
diff = 0
while diff < (16*60):
diff = time.time() — then # in seconds
sleep_time = 16*60 — diff
if sleep_time > 0:

First we pull out the follower id of a screen_name. The code you saw above is written as a recursive method, but you can write it with a different (non-recursive) function. Perhaps the most challenging part for us is Twitter’s limitations. Unfortunately, it does not give you the right to take as much data as you want. You can check which restrictions apply to which request. For example, you are allowed to make 15 requests for follower ids at each 15 minutes.

The data we send to MongoDB may not always be saved in the format we want. In the above code, we are performing the unwind operation. Let’s say we have 50 objects in a bag, we arrange it in 50 bags as objects, and save them in a new collection.

for j in profile:

if __name__ == ‘__main__’:
get_profile(0*300) # 0*300 gets 30 000 data, then you should
continue with 1*300, 2*300, … respectively.

We will then send a new request to Twitter to get their profile from these id’s. Here again due to the limitation of Twitter API, we can get max. 30 000 data at every 15 minutes and we continue this process by pulling data.

friend_string = ‘,’.join([str(friend) for friend in friends])

d = {“user_id” : userid, “friend_ids” : friend_string}


if __name__ == ‘__main__’:

Once you reached the profiles of the followers, we will draw the information of friends of followers’. You can see many lines of code for collecting friend id, but do not be afraid 😊 In fact, most of the code was written to take control of possible errors such as you can disconnect your internet connection, encounter a protected user so not be able to capture his friends, and so on. If you do not add any control mechanisms in such cases, unfortunately your code will stop working and you will encounter long error messages.

This is the end of our data extraction process. Let’s continue with the analysis part 🎈

Step 2: Processing data to analyze relationships between Twitter accounts

To process our data we first extract it from MongoDB.

As you can see above, we can get the full value of the collection with find(). Thanks to Python’s stunning pandas library, we can see this recorded data more readable.

user-friend list

It’s our turn to do our analysis. If we have friends in our user list, we will get them and we will not use the rest for this analysis.

temp_list = [] 
data_list = {}
for user_id in range(0, number_of_user_ids):
if str(total_data_friend_ids[user_id]) != ‘’:
total_subset_friend_ids =
temp_list = set(total_subset_friend_ids).intersection(list(total_data_user_ids))
if list(temp_list) != []:
data_list[total_data_user_ids[user_id]] = list(temp_list)

If we run the above code, we will still have a user-friend relationship, but we will not be interested in friends that are not in our user list. Now we can prepare the input for the user-friend relationship we want to visualize. We will use Gephi to visualize our results.

source-target data

By performing similar operations in the data extraction phase, we can view the user and friend lists in a table by using pandas to access the profile information.

source-target relation

It shows that there is a relationship (edge) between the IDs data in which we have created as above.

In the next tutorial we will discuss how to organize this output in accordance with Gephi’s format and visualize it. I think the most crucial part will be the visualization phase 🎉

All the best 😊

Ph.D. Cand. in CmpE @Boğaziçi University. #ai #privacy #uncertainty #ml #dl #running #cycling #she/her

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store