python tutorial
Computer Programming

How to crawl Instagram data using its public API and Python?

In 2020, the official Instagram API allow you to access only your own posts and not even public comments and posts on Instagram because of the rising privacy concerns from the users and frequent accusations of data-breach at many big companies including Facebook. This has made it difficult for programmers to crawl Instagram data.

Crawl Instagram

So, how to crawl Instagram data?

There’s still a workaround. It does provide an API which is publicly accessible.
Let’s try to hit this URL.

Eureka, it’s a JSON response:

Image for post

URL & JSON response

URL: https://www.instagram.com/explore/tags/travel/?a=1

Here, travel is the hashtag, as we can also see in the JSON response. And JSON response consist of all the posts containing hashtag travel. Now JSON response is easy to understand. Edges is the list that contains posts’ data. So, now all we need is to parse this JSON to get the data.

Programmatically parsing response using Python

Libraries required: requests

Here’s a quick Python code to get the captions from the posts, you can modify it for your own use:

import requests
class Parser:
    HASH_KEY = "graphql"    
    HASHTAG_KEY = "hashtag"    
    MEDIA_KEY = "edge_hashtag_to_media"    
    LIST_KEY = "edges"    
    NODE_KEY = "node"    
    CAPTION_LIST_KEY = "edge_media_to_caption"    
    TEXT_KEY = "text"
    def __init__(self, tag):
        self.tag = tag
    
    def get_url(self):
        url = "https://www.instagram.com/explore/tags/" +
                self.tag + "/?__a=1"
       return url
    def get_request_response(self):
        r = requests.get(url=self.get_url(), params="")
        data = r.json()
        return data
    def get_captions(self):
        captions = []
        data = self.get_request_response()
        nodes_list = data[Parser.HASH_KEY][Parser.HASHTAG_KEY][Parser.MEDIA_KEY][Parser.LIST_KEY]
        for obj in nodes_list:
            caption_list =  obj[Parser.NODE_KEY][Parser.CAPTION_LIST_KEY][Parser.LIST_KEY]
            if len(caption_list) > 0:
                caption = caption_list[0][Parser.NODE_KEY][Parser.TEXT_KEY]
                captions.append(caption)
                print(caption)
def main():
    parser = Parser("travel")
    parser.get_captions()
if __name__ == "__main__":
    main()

Later, we would be posting more programming tutorials to get you started with Python. We would advance from beginner level to intermediate and then to Machine Learning. If you like our posts, please like, comment and share. Also, don’t forget to subscribe to our awesome newsletter.

Happy Programming!

Leave a Reply

Subscribe to our Newsletter

Be the first to receive the latest buzz & more!
%d bloggers like this: