Lecture 16
Cornell University
INFO 4940/5940 - Fall 2025
October 23, 2025
Containerized environments for your code
Image credit: What is a container?
Image credit: What is a container?
Start with a trained and versioned model
requirements.txt
or renv.lock
app.py
or plumber.R
Start with a trained and versioned model
# Generated by the vetiver package; edit with care
FROM rocker/r-ver:4.4.0
ENV RENV_CONFIG_REPOS_OVERRIDE https://packagemanager.rstudio.com/cran/latest
RUN apt-get update -qq && apt-get install -y --no-install-recommends \
libcurl4-openssl-dev \
libicu-dev \
libsodium-dev \
libssl-dev \
make \
zlib1g-dev \
&& apt-get clean
COPY vetiver_renv.lock renv.lock
RUN Rscript -e "install.packages('renv')"
RUN Rscript -e "renv::restore()"
COPY plumber.R /opt/ml/plumber.R
EXPOSE 8080
ENTRYPOINT ["R", "-e", "pr <- plumber::plumb('/opt/ml/plumber.R'); pr$run(host = '0.0.0.0', port = 8080)"]
# # Generated by the vetiver package; edit with care
# start with python base image
FROM python:3.13
# create directory in container for vetiver files
WORKDIR /vetiver
# copy and install requirements
COPY vetiver_requirements.txt /vetiver/requirements.txt
#
RUN pip install --no-cache-dir --upgrade -r /vetiver/requirements.txt
# copy app file
COPY app.py /vetiver/app/app.py
# expose port
EXPOSE 8080
# run vetiver API
CMD ["uvicorn", "app.app:api", "--host", "0.0.0.0", "--port", "8080"]
If you have an Apple Silicon Mac
Add the --platform linux/amd64
to install R packages from compiled binaries rather than source.
ae-15
Instructions
ae-15
(repo name will be suffixed with your GitHub name).renv::restore()
to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.Activity
TODO check that we get a persistant local board on Posit Workbench
TODO make sure students use random ports for the output
Create a Docker container for your model using a local board board_local()
.
Build the Docker container and run it locally. Make predictions using the API.
07:00
The pins package publishes data, models, and other R and Python objects, making it easy to share them across projects and with your colleagues.
You can pin objects to a variety of pin boards, including:
board_gcs()
to connect to Google Cloud Storageservice-auth.json
Activity
TODO test this
Create a Docker container for your model using a Google Cloud Storage board board_gcs()
.
Build the Docker container and run it locally. Make predictions using the API.
07:00
TODO is this the same for Python? I think not.
vetiver_prepare_docker()
decomposes into two major functions:
vetiver_write_plumber()
to create a Plumber filevetiver_write_docker()
to create a Dockerfile and {renv} lockfileRequires additional tinkering with plumber.R
and Dockerfile
to work successfully
Activity
TODO see what needs to be done for Python
Create a Docker container for your model using a Google Cloud Storage board board_gcs()
.
Ensure the Docker container is correctly configured to use {googleCloudStorageR}.
Build the Docker container and run it locally. Make predictions using the API.
07:00
library(tidyverse)
library(tidymodels)
housing <- read_csv("data/tompkins-home-sales.csv")
set.seed(123)
housing_split <- housing |>
mutate(price = log10(price)) |>
initial_split(prop = 0.8)
housing_train <- training(housing_split)
housing_test <- testing(housing_split)
rf_rec <- recipe(
price ~ beds + baths + area + year_built,
data = housing_train
) |>
step_impute_mean(all_numeric_predictors()) |>
step_impute_mode(all_nominal_predictors())
housing_fit <- workflow() |>
add_recipe(rf_rec) |>
add_model(rand_forest(trees = 200, mode = "regression")) |>
fit(data = housing_train)
import pandas as pd
import numpy as np
from sklearn import model_selection, ensemble
housing = pd.read_csv('data/tompkins-home-sales.csv')
np.random.seed(123)
X, y = housing[["beds", "baths", "area", "year_built"]], np.log10(housing["price"])
X_train, X_test, y_train, y_test = model_selection.train_test_split(
X, y,
test_size = 0.2
)
housing_fit = ensemble.RandomForestRegressor(n_estimators=200).fit(X_train, y_train)
from sklearn import metrics
metric_set = [metrics.root_mean_squared_error, metrics.r2_score, metrics.mean_absolute_error]
y_predictions = pd.Series(housing_fit.predict(X_test))
housing_metrics = pd.DataFrame()
for metric in metric_set:
metric_name = str(metric.__name__)
metric_output = metric(y_test, y_predictions)
housing_metrics = pd.concat(
(
housing_metrics,
pd.DataFrame({"name": [metric_name], "score": [metric_output]}),
),
axis=0,
)
housing_metrics.reset_index(inplace=True, drop=True)
housing_metrics
name score
0 root_mean_squared_error 0.195166
1 r2_score 0.493361
2 mean_absolute_error 0.142366
Activity
Compute metrics for your model using the testing data.
Store these metrics as metadata in a vetiver model object.
Write this new vetiver model object as a new version of your pin.
05:00
How do we extract our metrics out to use them?
Activity
Obtain the metrics metadata for your versioned model.
What else might you want to store as model metadata?
How or when might you use model metadata?
07:00