Member-only story

Scrapping Tables from HTML Pages Without Using Python Libraries

A useful technique for data science and machine learning

Amit Chauhan
3 min readDec 29, 2022
Photo by Fotis Fotopoulos on Unsplash

Updated on: 14 August 2023

Introduction

There are many python library tools to do web scrapping for business analysis. The data in table format is easy to use everywhere for fast analysis. A table that can be scraped from the HTML page is an easy task now.

The libraries in python to scrape the data from HTML are beautiful soup, urllib selenium, scrapy, Mechanical Soup, LXML, etc. Sometimes it is hectic to manage xpath, creating urls, creating paths, and other time taking methods. In case, error throws related with lxml then we need to install it.

The more easy way to extract tables is with a small python pandas’ function i.e. read_HTML. There are different read methods in pandas for different files i.e read_csv to read csv files, read_json to read json files, read_excel to read excel files, etc.

Python examples:

# importing the pandas library
import pandas as pd

# scrapping the first table from the web page
df_population =pd.read_html('https://www.worldometers.info/world-population/population-by-country')[0]

# diplayin the extracted output table
df_population.head()

--

--

Amit Chauhan
Amit Chauhan

Written by Amit Chauhan

Data Scientist, AI/ML/DL, Azure Cloud

Responses (1)