Versioning and deploying models

Lecture 20

Dr. Benjamin Soltoff

Cornell University
INFO 4940/5940 - Fall 2024

November 14, 2024

Announcements

Announcements

  • Homework 04
  • Project exploration

Learning objectives

  • Create bundled model objects that can be saved to disk
  • Implement versioning for model objects
  • Review application programming interfaces
  • Generate a REST API for a model using {vetiver} and {plumber}

Application exercise

ae-19

  • Go to the course GitHub org and find your ae-19 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of the day

MLOps

MLOps with {vetiver}

Vetiver, the oil of tranquility, is used as a stabilizing ingredient in perfumery to preserve more volatile fragrances.

If you develop a model…

you can operationalize that model!

If you develop a model…

you likely should be the one to operationalize that model!

Tompkins County housing data

Tompkins County housing data

  • Home sale prices for Tompkins County, NY between 2022-23
  • Can certain measurements be used to predict the sale price?
  • Data collected from Redfin

Tompkins County housing data

  • N = 1,270
  • A numeric outcome, price
  • Other variables to use for prediction:
    • beds, baths, area, and year_built are numeric predictors
    • town and municipality could be nominal predictors
    • sold_date could be a date predictor

Home prices in Tompkins County, NY

sold_date price beds baths area lot_size year_built hoa_month town municipality long lat
2022-06-03 33000 2 1.0 980 NA 1979 NA Ithaca Ithaca city -76.51334 42.43245
2022-08-28 270000 3 2.0 1420 0.6030073 1955 NA Ithaca Unincorporated -76.45334 42.41719
2022-08-24 500000 3 1.5 2742 0.1399908 1900 NA Ulysses Trumansburg village -76.65999 42.54152
2022-09-20 400000 5 2.0 2066 0.3856749 1965 NA Ithaca Unincorporated -76.46803 42.47407
2022-08-08 469000 3 3.0 3015 0.7100092 1932 NA Ithaca Unincorporated -76.49558 42.45912
2023-04-19 205000 3 2.0 1200 0.8000000 1996 NA Ulysses Unincorporated -76.66433 42.49964
2023-03-22 350000 5 3.0 3080 2.5000000 1830 NA Enfield Unincorporated -76.59211 42.42251
2023-06-16 499000 3 2.5 2008 1.2000000 1935 NA Ithaca Unincorporated -76.46770 42.46063
2023-07-27 390000 4 3.0 2513 0.5699954 1987 NA Ithaca Unincorporated -76.48377 42.42092
2022-12-19 375000 3 2.5 1976 1.0000000 2004 NA Dryden Unincorporated -76.41183 42.43847

Time for building a model!

Spend your data budget

library(tidymodels)
set.seed(123)

housing_split <- housing |>
  mutate(price = log10(price)) |>
  initial_split(prop = 0.8)

housing_train <- training(housing_split)
housing_test <- testing(housing_split)

Fit a linear regression model 🚀

Or your model of choice!

housing_fit <-
  workflow(
    price ~ beds + baths + area + year_built,
    linear_reg()
  ) |>
  fit(data = housing_train)

⏱️ Your turn

Activity

Split your data in training and testing.

Fit a model to your training data.

05:00

Create a deployable bundle

Deploy preprocessors and models together

Create a deployable model object

library(vetiver)
v <- vetiver_model(housing_fit, "tompkins-housing")
v

── tompkins-housing ─ <bundled_workflow> model for deployment 
A lm regression modeling workflow using 4 features

{vetiver} butchers and bundles your model object with relevant information for publishing.

⏱️ Your turn

Activity

Create your {vetiver} model object.

Check out the default description that is created, and try out using a custom description.

Show your custom description to your neighbor.

05:00

Version your model

How could you share your resources?

Data, models, R objects, etc.

❌ Email
❌ GitHub

🫤 Shared network drive
🫤 Dropbox, Google Drive, Box.com, etc.

✅ Amazon S3
✅ Azure
✅ Google Cloud
✅ Microsoft 365

{pins} 📌

The {pins} package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues.

You can pin objects to a variety of pin boards, including:

  • a local folder (like a network drive or even a temporary directory)
  • Amazon S3
  • Azure Storage
  • Google Cloud

Pin your model

library(pins)

board <- board_temp()
board |> vetiver_pin_write(v)
Creating new version '20241114T182506Z-02bc5'
Writing to pin 'tompkins-housing'

Create a Model Card for your published model
• Model Cards provide a framework for transparent, responsible reporting
• Use the vetiver `.Rmd` template as a place to start

⏱️ Your turn

Activity

Pin your {vetiver} model object to a temporary board.

Retrieve the model metadata with pin_meta().

05:00

Version your model

Fit a random forest

rf_rec <- recipe(price ~ beds + baths + area + year_built + town, data = housing_train) |>
  step_impute_mean(all_numeric_predictors()) |>
  step_impute_mode(all_nominal_predictors())

housing_fit <- workflow() |>
  add_recipe(rf_rec) |>
  add_model(rand_forest(trees = 200, mode = "regression")) |>
  fit(data = housing_train)

Version your model

library(pins)
library(vetiver)

board <- board_temp()
v <- vetiver_model(housing_fit, "tompkins-housing", versioned = TRUE)
board |> vetiver_pin_write(v)

⏱️ Your turn

Activity

Create a new {vetiver} model object using your linear regression model that is explicitly versioned = TRUE and pin to your temporary board.

Then train a random forest model and create a new {vetiver} model object that is also versioned = TRUE with the same name.

Write this new version of your model to the same pin, and see what versions you have with pin_versions().

05:00

Make it easy to do the right thing

Make it easy to do the right thing

  • Robust and human-friendly checking of new data
  • Track and document software dependencies of models
  • Model cards for transparent, responsible reporting

Make it easy to do the right thing

⏱️ Your turn

Activity

Open the Model Card template, and spend a few minutes exploring how you might create a Model Card for this inspection model.

Discuss something you notice about the Model Card with your neighbor.

05:00

You can deploy your model as a…

REST API

Application programming interface (API)

An interface that can connect applications in a standard way

  • Representational State Transfer (REST)
  • Uniform Resource Location (URL)

RESTful queries

  1. Submit request to server via URL
  2. Return result in a structured format
  3. Parse results into a local format

Create a {vetiver} REST API

library(plumber)

pr() |>
  vetiver_api(v) |>
  pr_run()

⏱️ Your turn

Activity

Create a {vetiver} API for your model and run it locally.

Explore the visual documentation.

How many endpoints are there?

Discuss what you notice with your neighbor.

05:00

What does “deploy” mean?

What does “deploy” mean?

Where can {vetiver} deploy?

  • Posit Connect
  • AWS SageMaker
  • A public or private cloud, using Docker

How do you make a request of your new API?

library(httr2)
url <- "https://info4940.infosci.cornell.edu/tompkins-housing/predict"

request(url) |>
  req_perform() |>
  resp_body_json()

How do you make a request of your new API?

  • R packages like {httr2}
  • curl
  • There is special support in {vetiver} for the /predict endpoint

Any tool that can make an HTTP request can be used to interact with your model API!

Create a {vetiver} endpoint

You can treat your model API much like it is a local model in memory!

library(vetiver)

url <- "https://info4940.infosci.cornell.edu/tompkins-housing/predict"
endpoint <- vetiver_endpoint(url)
predict(endpoint, slice_sample(housing_test, n = 5))

⏱️ Your turn

Activity

Create a {vetiver} endpoint object for your API.

Predict with your endpoint for new data.

Optional: call another endpoint like /ping or /metadata.

05:00

Wrap-up

Recap

  • ML models can be deployed as APIs
  • Use {pins} to share your models
  • {vetiver} can help you bundle, version, and deploy your models

Acknowledgments