Sign in

Data Science Enthusiastic | Electronics R&D | Data Visualization | BI | NLP |

Visualization of healthcare data with two python libraries

In this article, we will discuss some basic visualization with matplotlib and seaborn library. Both libraries are well known in the data science and analytics community.

  • Matplotlib: It is very useful to plot basic plotting functionality with a customizable approach. It is very much comfortable with pandas and numpy. It also helps to plot multiple figures.
  • Seaborn: It is also a very powerful tool for visualization and more comfortable with a pandas data frame. It provides beautiful themes to the plot. It provides multiple figures but sometimes leads to OOM (Out Of Memory) problems.

Some examples of visualization with matplotlib…

Handy concepts for all NLP analysis techniques

This article will change the new beginners’ thoughts to learn natural language processing (NLP). When I started learning natural language processing first time is always something that how I will use all these concepts.

The prerequisite for this article is the basic knowledge of natural language concepts. You can read the below article to brush up on the concepts.

Topics to be covered:

  1. Reading sentiment text file
  2. Data Exploration and Text Processing
  3. Data Cleaning — Stopwords, Stemming, and Lemmatization
  4. Model Building — Naive Bayes
  5. Saving and Revoking the model

Reading sentiment text file

importing all the necessary libraries.

import pandas as pd import numpy…


The list is a part of python data structures

This article will provide the all depth concepts of the list as a part of the data structure. The concepts will go from basic to advance to know the inside-out of the list.


The first question raise in our mind is what is a list?

The following points will make you understand the list as shown below:

  • The list is may or may not be a collection of heterogeneous data types.
  • It is useful for small sequences.
  • The list is immutable in python means we can do modifications in the list with their items. …

Data Visualization

Basic concepts for forecasting models in machine learning with example

In this article, we will discuss time series concepts with machine learning examples that deal with the time component in the data.

Forecasting is so much important in the banking sector, weather, population prediction, and many more that directly deals with real-life problems.

Time series models are based on a function of time. The measurements are in regular intervals of time where time be an independent variable for modeling.

Z = f(t)

Z is the values Z1, Z2……Zn and “t” are the times at T1, T2….Tn intervals.

Topics to be covered:

  1. Components of Time Series
  2. White Noise
  3. Stationary and Non-Stationary
  4. Rolling Statistics and Dickey-Fuller…

Deep Learning

Predicting diabetes for binary classification

This article is related to find the prediction that person is diabetic or not based on given data. We will use two machine learning approaches to find the accuracy of prediction.

Topics to be covered:

  • Introduction about the data
  • Exploratory Data Analysis (EDA) with matplotlib and sweetviz
  • Prediction with SVM and KNN classifier

Introduction about the data

The data contains 8 independent variables and 1 dependent variable. The inspiration to make the prediction model to ease the working in less time and make a fast prediction for further medication.

The independent variables are: pregnancies, glucose, BMI, insulin, blood pressure, skin thickness, pedigree function, age

The dependent variable: outcome


Components and types of arguments in functions

This article will cover all the concepts related to functions and make you feel comfortable in programming. This topic is very easy to understand and yet difficult because of less practice.

Topics to be covered:

  • Introduction
  • Function arguments and their types
  • Global and Local variable
  • Passing data sequence to function
  • Anonymous function — Lambda


The worth of using function comes to know when you are writing the formula more than one or more times in a program o algorithm and it cost time.

It is important to make a single-function comprise of that formula and use these functions many times.

The benefits of using…


Error Handling with a try, except, and finally keywords

In this article, we will discuss error handling in python with a try, except and finally keywords to handle file and data management.

In general, the errors describe in these three categories as shown below:

  • Compile-time error: When we do some syntactical errors like missing something, spelling wrong, a undefined variable is used, etc. these type of errors comes in compile time.
  • Logical error: This type of error comes when the program is run or compiled properly but gives wrong or undesired output. So, this kind of error is known as a logical error.
  • Runtime error: This error occurs in…

Machine Learning

Unsupervised Learning in clustering to build tree for data

Another article in the series of Fully Explained machine learning algorithms i.e. BIRCH clustering in unsupervised learning.


This algorithm is used to perform hierarchical clustering based on trees. These trees are called CFT i.e. Cluster Feature Trees. The full form of BIRCH is Balanced Iterative Reducing Clusters using Hierarchies. The use case of BIRCH clustering is in below scenario:

  • Large dataset
  • Outlier detection
  • Data reduction.

The metric use in this cluster to measure the distance is Euclidean distance measurement.


There are some points that BIRCH is very useful in clustering algorithms as shown below:

  • It is very useful to handle…

Data Science

Estimator checking in supervised learning

In this article, we will discuss how to create a fake estimator just to compare with the model estimator. We will discuss two types of dummies in supervised learning i.e. regression and classification.

This concept comes in the metrics and scoring part of sklearn.


It is used to make predictions on a simple rule to know the simple baseline for compare regressors but not use in real problems.

Parameters in DummyRegressor

There are main parameters as shown below:

  1. Strategy: It is used to generate predictions based on its different arguments.
  • Mean: It is used to predict the mean of the…


A concept in statistics for data-driven decisions

This article will be fun for all readers

Hypotheses testing is an idea to be tested in statistics on observed data points. It is all about guessing the things that can be work or not to make meaningful results.

A good hypothesis contains “if” and “then” words. For example, if the temperature is increased then the solid will melt.

Pre-requisite for this article:

  • Knowledge of Descriptive Statistics
  • Knowledge of Inferential Statistics

Steps involve in decision making are shown below:

  • Formulate a hypotheses
  • Find the right test
  • Execute the test
  • Make a decision

When we always do hypotheses we have to know what is our null hypothesis. For example, if we say the…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store