Monte Carlo simulation studies are used to examine how eight factors impact predictions of a binary target outcome in data science (1) the choice of four DMMs [Logistic Regression (LR), Elastic Net Regression (GLMNET), Random Forest (RF), Extreme Gradient Boosting (XGBoost)], (2) the choice of three filter preprocessing feature selection techniques [Correlation Attribute Evaluation (CAE), Fisher's Scoring Algorithm (FSA), Information Gain Attribute Evaluation (IG)], (3) number of training observations, (4) number of features, (5) error of measurement, (6) class imbalance magnitude, (7) missing data pattern, and (8) feature selection cutoff. The findings are consistent with literature about which data properties and algorithms perform best. Measurement error negatively impacted pipeline performance across all factors, DMMs, and feature selection techniques.
"synopsis" may belong to another edition of this title.
(No Available Copies)
Search Books: Create a WantCan't find the book you're looking for? We'll keep searching for you. If one of our booksellers adds it to AbeBooks, we'll let you know!
Create a Want