Lifestyle

Scraping Twitter Information With Python [With 2 APIs]


Introduction

Social media platforms like Twitter are one of the great repositories for gathering datasets. Working on a new data science project requires a fair amount of data, gathering the dataset is not an easy task.

And Twitter provides a diversified genre of data because it is a collection of tweets from people with different mindsets and different sentiments. This kind of a dataset without bias is a much-needed prerequisite for training a new machine learning model.

Let’s get started!

We are going to walk through 2 APIs for Twitter data scraping.

  1. Tweepy
  2. Twint

Tweepy

Before we start walking through python code for scraping data using Tweepy API, there’s a point you need to know that we need credentials of a Twitter developer account and it’s a piece of cake if you already have them.

For the people who don’t have a developer account, you can apply for that over here. And before applying for a developer account you need to have a Twitter account. Applying for a developer account is an easy process and the application asks a few basic questions like the reason for the application, etc. And the approval of the developer account generally takes 2-3 days.

Once you receive approval for the developer account, make a note of your consumer API keys, access token, and access token secret from the “keys and tokens” section.

Also, there’s a point to be noted that there are few constraints for tweepy like you can only scrape tweets that are not older than a week. And a limit in scraping, up to 18000 tweets in a 15 minutes time frame.

Great, now that we have keys and tokens from the developer account let’s authorize them.

consumer_key = your consumer key

consumer_secret = your consumer secret

access_token = your access token

access_token_secret = your token secret

authorization = tweepy.OAuthHandler(consumer_key, consumer_secret)

authorization.set_access_token(access_token, access_token_secret)

api = tweepy.API(authorization,wait_on_rate_limit=True)

Now that we have authorized with our credentials, Let’s scrape tweets of a particular account. For now, let’s scrape tweets of Mr. Sundar Pichai. 

username = sundarpichai

count = 100

try:

 #line1

 tweets_obj = tweepy.Cursor(api.user_timeline,id=username).items(count)

  #line2

 tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets_obj]

  #line3

 tweets_df = pd.DataFrame(tweets_list)

except BaseException as e:

      print(something went wrong, ,str(e))

In the above snippet, line1 creates an iterable object with all the tweets and it is assigned to a variable “tweets_obj”. Once we are done with creating an iterable object, let’s iterate over it and extract all the data.

We are extracting only a few attributes like “created_at”, “id”, “text” and appending them to each entry in a 2D array. Where each entry has all the data of each tweet we scraped. Now that we have a 2D array with attributes as each entry we can convert it to a data frame by using “pd.DataFrame()” syntax.

The reason for converting an array to a data frame is more flexibility, availability of predefined methods, and easy access makes it stand out from all the data structures for data science projects.

Similarly, let’s walk through a code for scraping data that has a particular text query.

text_query = vocal for local

count = 100

try:

 #line1

 tweets_obj = tweepy.Cursor(api.search,q=text_query).items(count)

  #line2

 tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets_obj]

  #line3

 df = pd.DataFrame(tweets_list)

 except BaseException as e:

    print(something went wrong, ,str(e))

 

In the above snippet, everything is the same as the previous snippet. At last, we created a data frame with all the tweets containing the text query “vocal for local”.

If you are looking for a more specific or customized data scraping like including more attributes like retweets count, favorites count, etc. We can customize our syntax and extract other attributes provided by tweepy. For further reading on other attributes offered by tweepy have a look at the documentation.

Twint

Twint API doesn’t require any developer account credentials, you can scrape tweets easily without any authorization keys. Also, twint doesn’t have any restrictions like the number of tweets, time frames, scraping limits, etc. Twint provides you a seamless data scraping and an easy to use API.

We can print the list of followers of a person using his username from the twint API.

t_obj = twint.Config()

t_obj.Username = sundarpichai

twint.run.Followers(t_obj)

In the above snippet twint.Config() configures the twint API and makes things get started. And after assigning an object we can use that reference for our work, “t_obj.Username” assigns the username which we’ve entered. And twint.run.Followers perform a search of all followers of that username.

We can also store the scraped data into a data frame similar to the tweepy API.

t_obj.Limit = 100

t_obj.Username = sundarpichai

t_obj.Pandas = True

twint.run.Followers(t_obj)

result_df = twint.storage.panda.User_df

Everything in the snippet is almost the same as the previous snippet, only with an extra line of syntax “twint.storage.panda.User_df” which converts the scraped data to a data frame. The result data frame consists of a list of followers of the username given.

Now that we have seen scraping the follower data of a particular username, let’s walk through the code for scraping tweets of a particular account.

t_obj.Search = from:@sundarpichai

t_obj.Store_object = True

t_obj.Limit = 20

twint.run.Search(t_obj)

tweets = t.search_tweet_list

print(tweets)

In the above snippet, we are making the configured object to search tweets of a particular person, we can also set the limit of tweets while scraping using the syntax “t_obj.Limit”. And after running the search, it creates a list of all the tweets and we can assign it to a local variable as per our need.

After seeing the snippets of scraping info of a particular account, you may have a quick question which is how to scrape tweets containing a particular keyword?. Not an issue twint has a solution for this.

t_obj.Search = data science

t_obj.Store_object = True

t_obj.Limit = 100

twint.run.Search(t_obj)

tweets = t.search_tweet_list

print(tweets)

The above snippet is the same as the snippet for scraping tweets from a particular account, with a single difference in line1. We can also convert it to a data frame as per our convenience.

For further reading on twint API have a look at their repository and documentation.

Conclusion

We have understood the importance of scraping the data. Walked through two APIs and their features for scraping Twitter data. Seen a few methods for converting the scraped data into our required file format. Now that you are aware of these APIs, start scraping data for your data science projects!

We at upGrad are happy to help you and would also like to let you know about the opportunities that you can have by learning python. Python has been used extensively for Machine Learning and Data Science, two of the most popular and emerging technologies. Learning Python and also having knowledge of these skills will make you excel in your field and get better career opportunities.

We have a lot of courses developed along with industry experts and top academic institutes to provide you with all the skills required to excel in this field. Some of the courses that can help you make use of your knowledge in python and increase your career prospects: 

Data Science:

PG Diploma in Data Science: Developed with IIIT-B, it is a full-fledged data science course to enter into this field and make a mark in the industries with your knowledge. 

Masters of Science in Data Science: Developed in coordination with Liverpool John Moores University and IIIT-B, got a master’s degree in Data Science from one of the top universities of the world. 

Machine Learning:

Advance Certification in Machine Learning and AI: IIT madras, one of the best educational institutions of India, has partnered with upGrad to make an advanced course on Machine Learning for individuals to have complete knowledge of Machine Learning with this course. 

Masters of Science in Machine Learning and AI: Liverpool John Moores University and IIIT-B have together partnered with upGrad to provide complete masters of science degrees for the individuals to learn the technology in detail and get a formal degree in this technology to pave a successful path in this field.

PG Diploma in Machine Learning and AI: IIIT-B and upGrad came together to help individuals get an opportunity to do a 12-month long course on Machine Learning and AI and have the chance to enter this technology with this course.

Prepare for a Career of the Future

UPGRAD AND IIIT-BANGALORE’S PG DIPLOMA IN DATA SCIENCE

APPLY NOW

Socially Keeda

Socially Keeda, the pioneer of news sources in India operates under the philosophy of keeping its readers informed. SociallyKeeda.com tells the story of India and it offers fresh, compelling content that’s useful and informative for its readers.
Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker