If you have some background in machine learning and you’d like to learn how to quickly improve the quality of your models, you’re in the right place!
In this blog, you will accelerate your machine learning expertise by learning how to:
tackle data types often found in real-world datasets (missing values, categorical variables),
design pipelines to improve the quality of your machine learning code,
use advanced techniques for model validation (cross-validation),
1. Handling missing values
A. Simple Option: Drop Columns with Missing Values
The simplest option is to drop columns with missing values.
Unless most values in the dropped columns are missing, the model loses access to a lot of (potentially useful!) information with this approach. As an extreme example, consider a dataset with 10,000 rows, where one important column is missing a single entry. …