Lecture 22
Cornell University
INFO 4940/5940 - Fall 2024
December 3, 2024
Data that you use while building a model for training/testing
New data that you predict on after your model deployed
👩🏼🔧 My model returns predictions quickly, doesn’t use too much memory or processing power, and doesn’t have outages.
Metrics
👩🏽🔬 My model returns predictions that are close to the true values for the predicted quantity.
Metrics
Degradation of ML model performance due to changes in the data or in the relationships between input and output variables.
Activity
Using our data, what could be an example of data drift? Concept drift?
05:00
Typically it is most useful to compare to your model development data1
ae-21
ae-21
(repo name will be suffixed with your GitHub name).renv::restore()
to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.Activity
Create a plot or table comparing the development vs. monitoring distributions of a model input/feature.
How might you make this comparison if you didn’t have all the model development data available when monitoring?
What summary statistics might you record during model development, to prepare for monitoring?
07:00
library(vetiver)
library(tidymodels)
url <- "http://appliedml.infosci.cornell.edu:2300/predict"
endpoint <- vetiver_endpoint(url)
augment(endpoint, new_data = housing_monitor) |>
vetiver_compute_metrics(
date_var = sold_date,
period = "month",
truth = price,
estimate = .pred,
metric_set = metric_set(rmse, rsq, mae)
) |>
vetiver_plot_metrics()
Activity
Use the functions for metrics monitoring from {vetiver} to create a monitoring visualization.
Choose a different set of metrics or time aggregation.
Note that there are functions for using {pins} as a way to version and update monitoring results too!
05:00
Deployment of an ML model may alter the training data
Activity
What is a possible feedback loop for the Tompkins County housing data?
Do you think your example would be harmful or helpful? To whom?
05:00
Activity
Let’s say that the most important organizational outcome for an Ithaca realtor is how accurate a pricing model is in terms of percentage on prices in USD rather than an absolute value. (Think about being 20% wrong vs. $20,000 wrong.)
We can measure this with the mean absolute percentage error.
Compute this quantity with the monitoring data, and aggregate by week/month, number of bedrooms/bathrooms, or town location.
If you have time, make a visualization showing your results.
07:00
# convert back to raw dollars for calculating percentage error
augment(endpoint, housing_monitor) |>
mutate(
.pred = 10^.pred,
price = 10^price
) |>
group_by(town) |>
mape(price, .pred)
# A tibble: 12 × 4
town .metric .estimator .estimate
<chr> <chr> <chr> <dbl>
1 Caroline mape standard 22.8
2 Cortlandville mape standard 5.88
3 Danby mape standard 47.1
4 Dryden mape standard 31.5
5 Enfield mape standard 21.1
6 Groton mape standard 47.2
7 Harford mape standard 113.
8 Hector mape standard 19.8
9 Ithaca mape standard 20.3
10 Lansing mape standard 24.4
11 Newfield mape standard 24.3
12 Ulysses mape standard 27.4
Demonstration
Create a Quarto dashboard for model monitoring.
10:00