The reason we switch from a linear regression model to a non-linear regression model is because of the output feature variable. While studying linear regression last week, I got a data set in which the dependent variable has categories.
In this article, we will discuss basic concepts regarding logistic regression and learn how we will about maximum likelihood estimates and log(odds). A good understanding is very much important and it saves lots of our time.
First, we need to know why linear regression is not suitable for categories of data. From the graph below we observe that the first one is for linear regression and the second one is also for linear but with binary category values. The insights from these two graphs we can take are that the first graph has values in linearly approach i.e. the independent variable increases the dependent variable is also increased. But, the second graph doesn’t tell this type of behavior rather the dependent variable values are spotted on two values i.e. …
Learning pandas is very much important to analyze datasets and make insights from them. It is useful in data series and tables for manipulation and operations on numbers and categorical data. It is a library used in python language for data wrangling.
In our daily work, unknowingly we are generating lots of data in form of numbers, strings, sentences, characters, etc.
In this article, we will discuss the basic terms used in pandas wrangling to understand the working function behind them. This article will cover the following terms as shown below:
1. Series and data Frame
2. Rows and columns
3. Selecting and…
Big Data analysis project in banking domain is very much interesting by using scala language. Many banks do calls to many people around the country to convince them to invest in bank schemes. Well, I get many calls too. They try to explain the benefits of the scheme they want to sell.
After completing the campaign, the bank got a good customer who readies to invest in their bank. The total data set contains all the records of customers who subscribed to the deposit term and not.
The data set variables are age, job, marital, default, housing, loan, contact, month, day of the week, duration, campaign, pdays, previous, outcome, and y. …
Time series analysis is a part of daily activities happening around us with respect to time. As the day, month, years are passing with observation around us left with some information. To get this information we took help from statistical analysis to make the data/information in some formats and do analysis. Now with more and more data is generated everywhere it is difficult to use simple low-level tools for analysis. So, the new tools and algorithms are developed to make that data in a suitable format in large amount and solve our purpose to get information.
The time-series data are collected and stored in such a manner that we do future predictions and stabilize the business growth with increasing revenue by every year. …
Data scientists are now everybody’s dream job/work. First, do a question to yourself, Am I want to become a data scientist? When you feel to learn new things from your inner gut then start to take a learning path. The transition from other fields to the data science field is very difficult because it requires learning new tools and languages. But, don’t worry, I will make this skill requirement journey a little bit easier for you. This article will give you an overview of topics to learn to become a path of a data scientist.
Why everyone wants to learn TensorFlow in deep learning, as we deep dive into the machine learning projects, we use the “sklearn” library, and when we talk about deep neural networks, TensorFlow comes into the picture. Anaconda distribution is perfect for data science and machine learning with the pre-installed library packages, but we have to install TensorFlow explicitly because it does not come in anaconda distribution. To download the TensorFlow write the following command in the anaconda prompt and press enter.
#command to install TensorFlow
pip install tensorflow
Chi-square test is a non-parametric test in hypothesis testing to know the association of two categorical features in bi-variate data or records. Non-parametric tests are distribution-free test because it is based on very less number of assumptions that’s why it is not normally distributed. When the target variable doesn’t show normal distribution can be seen target are in ordinal or in nominal and existence of outliers. The Chi-square test also stated that the variance of a sample is somehow equal to the population from which the sample was taken. That’s why called the hypothesis for population variance.
To test whether one categorical variable is associated or has an effect on another categorical value, we check the hypothesis on these two conditions shown…
NumPy is a python library package used for multidimensional array computing. NumPy process is faster than list because of its homogeneous type that is packed densely in memory and performed vectorization. Vectorization means the computation done with parallel hardware process to make it fast. NumPy is the main purpose so, it is used in various fields like statistics, algebra, matrix operations, etc. The types of arrays we work with are shown below:
Time series analysis is dealing with date and time index points in the data frame. The most frequent use of time series in the finance field. This article will help people who always analyze data with respect to date and time. Well for time series analysis, we need a skilled analyst who knows the forecasting evaluation. Financial analysts are now very skilled with programming also rather analyze in a classical way. How statistics and machine learning helps the finance sector to grow with advanced technologies around us.
This article deals with only basic analysis w.r.t date and time. Big analysis needs foundation skills first. …
Understanding statistics look like a side parallel road for data science and machine learning people. But learning statistics is worth making the inferences and solutions from the data. The road not to be taken by many people should have to accept the statistics with their daily dose. Well, in this article we will discuss the Z, T and P statistics distribution and will try to learn why we use them in data science. Before diving into this concept we will discuss some basic definition and terms as shown below:
Section 1: Types of Data, Histogram and Scatter plot
Section 2: Central Measures values and Measures of…