library(forested)set.seed(123)forested_split<-initial_split(forested, prop =0.8)forested_train<-training(forested_split)forested_test<-testing(forested_split)# decrease cost_complexity from its default 0.01 to make a more# complex and performant tree. see `?decision_tree()` to learn more.tree_spec<-decision_tree(cost_complexity =0.0001, mode ="classification")forested_wflow<-workflow(forested~., tree_spec)forested_fit<-fit(forested_wflow, forested_train)
Metrics for model performance
Metric sets are a way to combine multiple similar metric functions together into a new function.
# A tibble: 6 × 4
tree_no_tree .metric .estimator .estimate
<fct> <chr> <chr> <dbl>
1 Tree accuracy binary 0.946
2 No tree accuracy binary 0.941
3 Tree specificity binary 0.582
4 No tree specificity binary 0.974
5 Tree sensitivity binary 0.984
6 No tree sensitivity binary 0.762
Add response here. Specificity and sensitivity differ substantially between groups. The model does much better at predicting forested land when there are trees present. The model does much better at predicting non-forested land when trees are not present.
Your turn: Compute and plot an ROC curve for the decision tree model.
What data are being used for this ROC curve plot?
# Your code here!augment(forested_fit, new_data =forested_train)|>roc_curve(truth =forested, .pred_Yes)|>autoplot()
Add response here. This is constructed using the training set.
Dangers of overfitting
Your turn: Use augment() and a metric function to compute a classification metric like brier_class().
Compute the metrics for both training and testing data to demonstrate overfitting!
Random Forest Model Specification (classification)
Main Arguments:
trees = 1000
Computational engine: ranger
rf_wflow<-workflow(forested~., rf_spec)rf_wflow
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Formula
Model: rand_forest()
── Preprocessor ────────────────────────────────────────────────────────────────
forested ~ .
── Model ───────────────────────────────────────────────────────────────────────
Random Forest Model Specification (classification)
Main Arguments:
trees = 1000
Computational engine: ranger
Your turn: Use fit_resamples() and rf_wflow to:
Keep predictions
Compute metrics
ctrl_forested<-control_resamples(save_pred =TRUE)# Random forest uses random numbers so set the seed firstset.seed(234)rf_res<-fit_resamples(rf_wflow, forested_folds, control =ctrl_forested)collect_metrics(rf_res)