Draft

Project 01
Modified

September 26, 2025

Important

The purpose of the draft is to give you an opportunity to get early feedback on your analysis. Therefore, the draft will focus primarily on the exploratory analysis and initial drafts of the final product(s).

Write the draft write-up in the report.qmd file in your project repo. This should document your modeling strategies to date. At minimum, you are expected to include:

TipYou do not need to fit all the models in the report.qmd file

You may fit all the models in a separate R/Python script or Quarto file and save/export any appropriate model objects so you can report relevant metrics or create visualizations/tables to report on the models’ performance. You should include the results of the models in the report.qmd file.

TODO include R and Python examples

Standard R objects can be saved to disk using readr::write_rds() or save().

# tune some complex models
tune_rf_res <- tune_grid(...)
tune_lgbm_res <- tune_grid(...)

# save a single object
write_rds(tune_rf_res, file = "output/tune_rf_res.rds")
write_rds(tune_lgbm_res, file = "output/tune_lgbm_res.rds")

# save together
save(tune_rf_res, tune_lgbm_res, file = "output/tune_results.RData")

For model fit() objects, you will likely want to butcher() the object to reduce its overall size (otherwise the file size may be hundreds of megabytes.)

# fit the best rf model
best_rf <- fit_best(tune_rf_res)
best_rf_lite <- butcher(best_rf)

Read the documentation for {butcher} for more information.

Evaluation criteria

Category Less developed projects Typical projects More developed projects
Objectives Objective is not clearly stated or significantly limits potential analysis. Clearly states the objective(s), which have moderate potential for relevant impact. Clearly states complex objective(s) that leads to significant potential for relevant impact.
Data description

Simple description of some aspects of the dataset, little consideration for sources.

The description is missing answers to applicable questions detailed in the “Datasheets for Datasets” paper.

Answers all relevant questions in the “Datasheets for Datasets” paper. All expectations of typical projects + credits and values data sources.
Decisions based on EDA

Identifies minimal actions taken during the modeling stage based on the results of the EDA.

Actions are unlikely to effect predictions.

Identifies concrete actions taken during the modeling stage based on the results of the EDA. All expectations of typical projects + actions demonstrate deliberate and careful analysis of the exploratory analysis.
Resampling strategy Does not use resampling methods (or an inappropriate method) to ensure robust model evaluation. Selects an appropriate resampling strategy. All expectations of typical projects + provides a thorough justification for the resampling strategy.
Modeling strategies

Includes only simplistic models. Does not demonstrate understanding of the types of models covered in the course.

Feature engineering steps are non-existent.

Does not select evaluation metrics or metrics are not appropriate to the objective(s) + models.

Identifies several modeling strategies which could generate a high-performance model.

Documents relevant feature engineering steps to be evaluate for specific modeling strategies. Steps are selectively applied to appropriate models.

Evaluation metrics are appropriate for the objective(s) + models.

All expectations of typical projects + provides thorough explanation for the models/feature engineering/metrics. Shows care in selecting their methods.
Initial results

Only reports results of null model.

Results are presented in a disjointed or uninterpretable manner.

Reports the results of some (but not all) of their modeling strategies.

Results are presented in a clear and interpretable manner.

Reports the results of the majority of their modeling strategies.

Results are effectively communicated through the use of tables and/or figures.