Data Science Enthusiastic | Electronics R&D | Data Visualization | BI | NLP |

Machine Learning

Unsupervised learning in machine learning on density-based clusters

Clustering In unsupervised Learning. A photo by Author

In this article, we will discuss the machine learning clustering-based algorithm that is the DBScan cluster. The approach in this cluster algorithm is density-based than another distance-based approach. The other cluster which is distance-based looks for closeness in data points but also misclassifies if the point belongs to another class. So, density-based clustering is suited in this kind of scenario. The cluster algorithms come in unsupervised learning in which we don’t rely on target variables to make clusters.

In a cluster, the main concern is on maximum dense connected points. …


Data Science, Machine Learning

A regression and classification approach in machine learning

Gradient Boosting in the machine learning. A photo by Author

Hello Everyone, In this article, we will discuss the ensemble boosting techniques that is gradient boosting. In an earlier article on ensemble, we discussed random forest that is a bagging technique. In boosting the weak learners’ predict on the training set and the error/residual left with their weights are forwarded highly weighted ones’ to the next weak learner.

We saw Gini, Entropy in bagging techniques but in the case of boosting we will deal with loss functions because highly weighted loss goes to the next base learner. …


Statistics

Statistics help to understand the behaviors in machine learning

Types of correlation. A photo by Author

In this article, we will discuss the correlation between variables to observe the dispersion of the data. A wide view of the data graph gives insights to pick the valuable machine learning algorithm for the best fit. The machine learning algorithms are differentiated based on linear, non-linear, density, and cluster.

The correlation (co-variation) divided into parts as shown below:

  • Observing if there is a relationship between variables or not.
  • If the correlation exists, then how significant it is they with each other.
  • The reason of cause and effect relation.

Types of correlation in variables

  • Positive: This type of correlation is…


Machine Learning

Machine learning classification algorithm study to solve real cases in data science.

Example graph of KNN. A photo by Author

Hello Everyone, another article in the series fully explained machine learning algorithms. In this article, we will discuss the k nearest neighbor classification problem. A good article is like a flow of the story and readers get as much information in a small amount of time.

Let’s clarify some points

  • In the case of unsupervised nearest is a base for many learning techniques like clustering.
  • In the case of supervised learning, it comes in two categories: classification and regression.

So, we will discuss the supervised classification problem learning technique.

The main goal is to predict the new data point based…


Machine Learning

The machine learning methods based on several decision trees

Ensemble Models. A photo by Author

In this article, The ensemble techniques are based on multiple decision trees. If we talk about only the decision tree that gives high variance after modeling which leads to over-fitting. The benefit of using ensemble methods is giving good prediction with reduced variance by averaging(bagging) or by boosting techniques.

There are various types of ensemble techniques likes classification and regression as shown below:

  • Bagging, Random forest and Extra trees are in averaging methods.
  • Adaboost, Gradient boosting, CatBoost, etc are in boosting methods.

In averaging methods, usually used to reduce the variance while the final prediction is made on the average…


Machine Learning

In-depth study of decision tree for classification problem

Decision Tree. A photo by Author

In this article, the decision tree algorithm is the base model for all other tree models. The decision tree comes in the CART (classification and regression tree) algorithm that is an optimized version in sklearn. These are non-parametric supervised learning. The non-parametric means that the data is distribution-free i.e the variables are nominal or ordinal.

The decision tree decides by choosing the root node and split further into nodes. The splitting is based on metrics used in the decision tree. The earlier article was on metrics of regression and classification. …


Machine Learning

Model evaluation with metric API for regression and classification

Photo by Sheri Hooley on Unsplash

In this article, we will discuss various metrics of regression and classification in machine learning. We always think of steps involved in modeling a good machine learning algorithm. The one-step is the metrics for evaluation of the goodness of the model. When we fit our model and make a prediction, then we always try to know the error and the accuracy. This article will try to deliver and explain various error measurement methods in regression and classification.

There are criteria to evaluate the prediction quality of the model as shown below:

  • Metric functions: that we will study in this article.


Classification metric in supervised learning

Confusion Matrix. A photo by Author

Why this metric named as confusion matrix? From my point of view, the matrix term refers to row and column, the confusion term refers to the thought of the machine that didn’t classify 100% accurately. Let’s learn about the confusion matrix a little deeper in this article. It is a combined metric of classification to visualize the performance of the model.

The topics we will cover in this article are shown below:

  • Confusion matrix
  • Type 1 and Type 2 Error
  • Accuracy
  • Precision
  • Recall
  • False omission rate
  • F1-score
  • MCC or phi coefficient

Confusion Matrix

The confusion matrix gives very fruitful information about the…


Data Science

How the classification problem is solved with a real-life example.

SVM Classification. A photo by Author

In this article, we will discuss the most used machine learning algorithm in classification problems. The support vector machine (SVM) algorithm is used for regression, classification, and also for outlier detection.

The hyper line or hyperplane are separated by the decision points or support vectors. The support vectors are the sample points that provide maximum margin between the closest different class points. This separation plane is called margin. The error will be less with a larger margin and the rate of miss-classification is also less.


Data Visualization

Visualization with Plotly

Photo by Markus Winkler on Unsplash

To know the insights of the data we need wonderful visualization tools. Python is very useful programming to visualize data with pre-defined libraries in it. There are many useful visualization libraries like:

  • Seaborn
  • Matplotlib
  • Plotly

Among this matplotlib is a base library and others are built on top of it. Matplotlib idea comes from Matlab visuals plotting. It gives us static images. The matplotlib contains almost every plot.

Seaborn is designed to create an interactive statistical plot. It uses matplotlib as a backend. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store