Handling Missing Data - Part 1
One of the big topics I have been speaking about is “ Data Cleansing for Machine Learning ”. A major part of the data cleansing process is filling in missing values. These values could be missing for a variety of reasons (manually entered data, improper data collection, incorrect exception handling, etc.) Missing data can be summed up into 3 categories: MAR: Missing at Random, for patterns tied to known variables (e.g., age, education) where someone did not want to answer a question. MCAR: Missing Completely at Random, for random glitches, accidental loss (i.e. sensor failure) MNAR: Missing Not at Random, describes a situation where data is missing because of the value of the missing data itself. To counter the issue of missing data, several techniques are available to fill in missing data. In my video , I discuss 3 of those techniques. Multiple Imputation by Chained Equations (MICE) handles numeric or...