Lecture 16
Cornell University
INFO 4940/5940 - Fall 2025
October 23, 2025
Containerized environments for your code
Image credit: What is a container?
Start with a trained and versioned model
Dockerfilerequirements.txt or renv.lockapp.py or plumber.RStart with a trained and versioned model
# Generated by the vetiver package; edit with care
FROM rocker/r-ver:4.5.1
ENV RENV_CONFIG_REPOS_OVERRIDE https://packagemanager.rstudio.com/cran/latest
RUN apt-get update -qq && apt-get install -y --no-install-recommends \
libcurl4-openssl-dev \
libicu-dev \
libsodium-dev \
libssl-dev \
libx11-dev \
make \
zlib1g-dev \
&& apt-get clean
COPY vetiver_renv.lock renv.lock
RUN Rscript -e "install.packages('renv')"
RUN Rscript -e "renv::restore()"
COPY plumber.R /opt/ml/plumber.R
EXPOSE 8080
ENTRYPOINT ["R", "-e", "pr <- plumber::plumb('/opt/ml/plumber.R'); pr$run(host = '0.0.0.0', port = 8080)"]# # Generated by the vetiver package; edit with care
# start with python base image
FROM python:3.13
# create directory in container for vetiver files
WORKDIR /vetiver
# copy and install requirements
COPY vetiver_requirements.txt /vetiver/requirements.txt
#
RUN pip install --no-cache-dir --upgrade -r /vetiver/requirements.txt
# copy app file
COPY app.py /vetiver/app/app.py
# expose port
EXPOSE 8080
# run vetiver API
CMD ["uvicorn", "app.app:api", "--host", "0.0.0.0", "--port", "8080"]Use the Terminal/Shell to build your Docker container
If you have an Apple Silicon Mac
Add the --platform linux/amd64 to install R packages from compiled binaries rather than source.
ae-15Instructions
ae-15 (repo name will be suffixed with your GitHub name).renv::restore() (R) or uv sync (Python), open the Quarto document in the repo, and follow along and complete the exercises.Instructions
Create a Docker container for your model using a board_local().
Build the Docker container and run it locally. Make predictions using the API.
07:00
The pins package publishes data, models, and other R and Python objects, making it easy to share them across projects and with your colleagues.
You can pin objects to a variety of pin boards, including:
board_gcs() to connect to Google Cloud Storageservice-auth.json
Define file location via environment variable
Instructions
Create a Docker container for your model using board_gcs().
Build the Docker container and run it locally. Make predictions using the API.
07:00
vetiver_prepare_docker()/vetiver.prepare_docker() decomposes into two major functions
Requires additional tinkering (some automatic, some manual) to ensure correct authentication procedures
Create the Docker artifacts, build the container, and run it locally. Make predictions using the API.
Modify plumber.R to load {googleCloudStorageR}
Copy service-auth.json to same directory as Dockerfile
Modify Dockerfile to correctly incorporate service-auth.json. After the run apt-get step, add the following lines
Modify vetiver_requirements.txt to include gcsfs dependency
Copy service-auth.json to same directory as Dockerfile
Modify Dockerfile to correctly incorporate service-auth.json. After the COPY app.py step, add the following lines
12:00
library(tidyverse)
library(tidymodels)
housing <- read_csv("data/tompkins-home-sales.csv")
set.seed(123)
housing_split <- housing |>
mutate(price = log10(price)) |>
initial_split(prop = 0.8)
housing_train <- training(housing_split)
housing_test <- testing(housing_split)
rf_rec <- recipe(
price ~ beds + baths + area + year_built,
data = housing_train
) |>
step_impute_mean(all_numeric_predictors()) |>
step_impute_mode(all_nominal_predictors())
housing_fit <- workflow() |>
add_recipe(rf_rec) |>
add_model(rand_forest(trees = 200, mode = "regression")) |>
fit(data = housing_train)import pandas as pd
import numpy as np
from sklearn import model_selection, ensemble
housing = pd.read_csv('data/tompkins-home-sales.csv')
np.random.seed(123)
X, y = housing[["beds", "baths", "area", "year_built"]], np.log10(housing["price"])
X_train, X_test, y_train, y_test = model_selection.train_test_split(
X, y,
test_size = 0.2
)
housing_fit = ensemble.RandomForestRegressor(n_estimators=200).fit(X_train, y_train)from sklearn import metrics
metric_set = [metrics.root_mean_squared_error, metrics.r2_score, metrics.mean_absolute_error]
y_predictions = pd.Series(housing_fit.predict(X_test))
housing_metrics = pd.DataFrame()
for metric in metric_set:
metric_name = str(metric.__name__)
metric_output = metric(y_test, y_predictions)
housing_metrics = pd.concat(
(
housing_metrics,
pd.DataFrame({"name": [metric_name], "score": [metric_output]}),
),
axis=0,
)
housing_metrics.reset_index(inplace=True, drop=True)
housing_metrics name score
0 root_mean_squared_error 0.195166
1 r2_score 0.493361
2 mean_absolute_error 0.142366
Instructions
Compute metrics for your model using the testing data.
Store these metrics as metadata in a vetiver model object.
Write this new vetiver model object as a new version of your pin.
05:00
How do we extract our metrics out to use them?
Instructions
Obtain the metrics metadata for your versioned model.
What else might you want to store as model metadata?
How or when might you use model metadata?
07:00
path argument to create Docker artifacts in a subdirectory (file management)