Web scraping vs API’s: Advantages and Disadvantages

Joseph Husney
2 min readJan 13, 2021

As a data scientist, retrieving accurate data is essential before analyzing it. There are many different methods in getting that data. The two that I will discuss today are Web scraping and API’s (Application Programming Interface).

Web scraping is the process of retrieving data from various web pages through html tags. Basically, one filters through the information on a webpage or webpages to collect the data that they need. Using API’s is quite different. As the name suggests, it consists of an interface that a programmer can take advantage of to retrieve data. More specifically, there are sets of methods available (different for each API) that allow you to interact or retrieve the information.

The question becomes, how does one decide when to use each method. The simple answer is that it depends. If there are API’s readily available online, then it is typically a lot easier to get what you need as there are methods to help you with that. Web scraping can be grueling trying to figure out how to get exactly what you need from the site. The data from API’s is typically cleaner than if one uses web scraping.

Reasons to consider web scraping over API’s would be if the owner of the API charges for its usage or if the data isn’t available through API’s. Here’s something interesting to think about: Consider a scenario where a data scientist wants to come up with a novel discovery. A lot of the time, it is difficult to come up with something people haven’t seen yet if you are using API’s. This is because if the data is easily out there, the different possibilities of usage with that data have probably been exhausted. However, through web scraping, you can look at data that may not be easily available to people, allowing you to discover something that hasn’t been thought of before.

Summary

Advantages of API’s:

  1. Easily accessible through methods provided
  2. Data more likely to be accurate

Advantages of web scraping:

  1. No charge
  2. API’s unavailable
  3. Most people don’t have access to the data that you have accessed.

--

--