Working with unstructured text data- Part 1

Retrieve tweets by search term using Twitter API in Python.

The main focus of this article is to expose some of the underlying challenges of working with unstructured data, particularly text data coming from social media channels that has a range of language use, styles, vocabulary etc. The references provided at the end of the article can be used to learn about Twitter API, the set up, authorisation and the use of the module Tweepy for tweets extraction. The article aims at delivering a snapshot of a working approach to “get the job done” and in no way is the only right way of doing this. The article is written with the assumption that the reader has some working knowledge of pandas and Python programming.

Dataset description- schema
100 most liked tweets
100 most retweeted tweets
  1. Twitter. 2021. Twitter API — Tap into what’s happening. [ONLINE] Available at: [Accessed 16 February 2021].
  2. Twitter. 2021. Api reference index. [ONLINE] Available at: [Accessed 16 February 2021].
  3. Twitter. 2021. Getting started. [ONLINE] Available at: [Accessed 16 February 2021].



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lubna Khan

Data Scientist/ Analyst, Language Tutor, AI enthusiast, Polyglot, Artist and lifelong learner.