Menu

Selva Prabhakaran

Selva is an experienced Data Scientist and leader, specializing in executing AI projects for large companies. Selva started machinelearningplus to make Data Science / ML / AI accessible to everyone. The website enjoys 4 Million+ readership. His courses, lessons, and videos are loved by hundreds of thousands of students and practitioners.

SQL Window Functions

SQL Window Functions – Made simple and intuitive

SQL window functions is one of the advanced concepts, understanding which will give you the ability to do complex data wrangling and transformations in SQL. In this guide, we will intuitively understand how window functions work in a way you will never forget. Don’t memorize anything, just read through and you will clearly understand how …

SQL Window Functions – Made simple and intuitive Read More »

SQL Tutorial

SQL Tutorial – A Simple and Intuitive Guide to the Structured Query Language

SQL, short for Structured Query Language is a programming language used to communicate with databases and do various types of Data wrangling operations. This is an essential skillset for any type of Data related jobs. In this tutorial, let’s get started with the basics of SQL. 1. Why you should learn SQL? First, let’s understand …

SQL Tutorial – A Simple and Intuitive Guide to the Structured Query Language Read More »

Cook’s Distance for Detecting Influential Observations

Cook’s distance is a measure computed to measure the influence exerted by each observation on the trained model. It is measured by building a regression model and therefore is impacted only by the X variables included in the model. What is Cooks Distance? Cook’s distance measures the influence exerted by each data point (row / …

Cook’s Distance for Detecting Influential Observations Read More »

How to detect outliers using IQR and Boxplots?

Let’s understand what are outliers, how to identify them using IQR and Boxplots and how to treat them if appropriate. 1. What are outliers? In statistics, outliers are those specific data points that differ significantly from other data points in the dataset. There can be various reasons behind the outliers. It can be because of …

How to detect outliers using IQR and Boxplots? Read More »

Install pip mac

install pip mac – How to install pip in MacOS?: A Comprehensive Guide

Pip is a widely used package manager for Python, allowing you to install and manage Python packages easily. In this blog post, we’ll explore various methods to install Pip on MacOS. I’ll provide clear, reproducible code examples for each method, making it easy for you to get started with Pip on your MacOS system. Using …

install pip mac – How to install pip in MacOS?: A Comprehensive Guide Read More »

Scrapy vs. Beautiful Soup: Which is better for web scraping?

Web scraping is the technique of extracting data from a specific website or web page. This has wide applications in: Research and publication purposes Competitor and market studies Creating data for machine learning models The extracted data can be stored in any format be it a csv, txt, json, API etc so that it can …

Scrapy vs. Beautiful Soup: Which is better for web scraping? Read More »

add Python to PATH – How to add Python to the PATH environment variable in Windows?

1. What is the purpose of adding Python to the PATH environment variable? Adding Python to the PATH environment variable in Windows allows you to run Python commands from any directory within the command prompt. Here are the steps to add Python to the PATH variable: 2. What is the PATH environment variable in Windows? …

add Python to PATH – How to add Python to the PATH environment variable in Windows? Read More »

MICE imputation

MICE imputation – How to predict missing values using machine learning in Python

MICE Imputation, short for ‘Multiple Imputation by Chained Equation’ is an advanced missing data imputation technique that uses multiple iterations of Machine Learning model training to predict the missing values using known values from other features in the data as predictors. What is MICE Imputation? You can impute missing values by predicting them using other …

MICE imputation – How to predict missing values using machine learning in Python Read More »

conda list environments – How to view all the virtual environments present in conda?

conda is a popular package management system that allows you to create isolated environments with different versions of packages and dependencies. In this one, let’s see how to view a list of all virtual environments in conda. How to list all the virtual Environments? Earlier you saw everything about creating conda environments. If you want …

conda list environments – How to view all the virtual environments present in conda? Read More »

conda delete environment

conda delete environment – How to remove a conda environment and all the associated packages?

conda is a popular package management system that allows you to create isolated environments with different versions of packages and dependencies. Earlier we saw how to create a new conda environment and steps to manage it. However, if you no longer need an environment, you can delete it to free up disk space and simplify …

conda delete environment – How to remove a conda environment and all the associated packages? Read More »

Spline Interpolation

Spline Interpolation – How to find the polynomial curve to interpolate missing values

Spline interpolation is a special type of interpolation where a piecewise lower order polynomial called spline is fitted to the datapoints. That is, instead of fitting one higher order polynomial (as in polynomial interpolation), multiple lower order polynomials are fitted on smaller segments. This can be implemented in Python. You can do non-linear spline interpolation …

Spline Interpolation – How to find the polynomial curve to interpolate missing values Read More »

Interpolation in Python

Interpolation in Python – How to interpolate missing data, formula and approaches

Interpolation can be used to impute missing data. Let’s see the formula and how to implement in Python. But, you need to be careful with this technique and try to really understand whether or not this is a valid choice for your data. Often, interpolation is applicable when the data is in a sequence or …

Interpolation in Python – How to interpolate missing data, formula and approaches Read More »

Missing Data Imputation Approaches

Missing Data Imputation Approaches | How to handle missing values in Python

Machine Learning works on the idea of garbage in – garbage out. If you put in useless junk data to the machine learning algorithm, the results will also be, well, ‘junk’. The quality and consistency of results depend on the data provided. Missing values in data degrade the quality. Why clean the data before training …

Missing Data Imputation Approaches | How to handle missing values in Python Read More »

EDA

Exploratory Data Analysis (EDA) – How to do EDA for Machine Learning Problems using Python

Exploratory Data Analysis, simply referred to as EDA, is the step where you understand the data in detail. You understand each variable individually by calculating frequency counts, visualizing the distributions, etc. Also the relationships between the various combinations of the predictor and response variables by creating scatterplots, correlations, etc. EDA is typically part of every …

Exploratory Data Analysis (EDA) – How to do EDA for Machine Learning Problems using Python Read More »

How to reduce the memory size of Pandas Data frame

How to reduce the memory size of Pandas Data frame

After importing with pandas read_csv(), dataframes tend to occupy more memory than needed. This is a default behavior in Pandas, in order to ensure all data is read properly. It’s possible to optimize that, because, lighter the dataframe, faster will be the operations you do on them later on. So, let’s first check how much …

How to reduce the memory size of Pandas Data frame Read More »

ML Modeling - Problem statement and Data description

ML Modeling – Problem statement and Data description

ML modeling is the step where machine learning is used to find patterns in data and use that learned knowledge to predict an outcome. The type of ML modeling we are going to solve in this problem is called ‘Churn Modeling’. Let’s first understand the Churn modeling problem statement and then go over the data …

ML Modeling – Problem statement and Data description Read More »

An Introduction to AdaBoost

AdaBoost – An Introduction to AdaBoost

Adaboost is one of the earliest implementations of the boosting algorithm. It forms the base of other boosting algorithms, like gradient boosting and XGBoost. This tutorial will take you through the math behind implementing this algorithm and also a practical example of using the scikit-learn Adaboost API. Contents: What is boosting? What is Adaboost? Algorithm …

AdaBoost – An Introduction to AdaBoost Read More »

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science