AE 15: Deploying models to the cloud using Docker

Python

Application exercise
Python
Modified

October 23, 2025

Load the data

import pandas as pd
housing = pd.read_csv('data/tompkins-home-sales.csv')

Build a model

  • Log transform the price variable
  • Split into training/test set
from sklearn import model_selection
import numpy as np
np.random.seed(123)
X, y = housing[["beds", "baths", "area", "year_built", "town"]], np.log10(housing["price"])
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    X, y,
    test_size = 0.2
)

Train a random forest model:

from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Define feature columns
numeric_features = ["beds", "baths", "area", "year_built"]
categorical_features = ["town"]

# Create preprocessing steps
numeric_transformer = SimpleImputer(strategy="mean")
categorical_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("onehot", OneHotEncoder(handle_unknown="ignore"))
])

# Combine preprocessors
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, categorical_features)
    ]
)

# Create pipeline with preprocessor and model
housing_fit = Pipeline(steps=[
    ("preprocessor", preprocessor),
    ("regressor", RandomForestRegressor(n_estimators=200, random_state=123))
])

# Prepare training data with all features
X_train_full = housing.loc[X_train.index, numeric_features + categorical_features]
housing_fit.fit(X_train_full, y_train)

Create a Docker container using a board

Pin model to a board

from pins import board_local
from vetiver import vetiver_pin_write, VetiverModel

# Bundle model using Vetiver
v = VetiverModel(______, ______, prototype_data = X_train)
v.description

# Store model on a board
board = ______(versioned = True, allow_pickle_read = True)
______(board, v)

Create Docker artifacts

Choosing a port number

Students using Posit Workbench are on a shared server where everyone is building and running containers from the same device. We need to ensure the port number where you are broadcasting the API is unique. Choose a random four-digit number to use as your port number and use it for the rest of the application exercise.

Use port 8080 for the rest of the application exercise.

from vetiver import prepare_docker

prepare_docker(
  board = ______,
  pin_name = ______,
  port = "____"
)

Build and test Docker container

Run these commands in the Terminal tab of Positron or your local terminal, replacing <NETID> with your actual NetID and <PORT> with your chosen port number:

docker build -t housing-<NETID> .
docker run -p <PORT>:<PORT> housing-<NETID>
Use your own NetID

Students using Posit Workbench are on a shared server where everyone is building and running containers from the same device. You need to ensure your container has a unique name to avoid conflicts with other users.

Run these commands in the Terminal tab of Positron or your local terminal:

docker build -t housing .
docker run -p 8080:8080 housing

Test the API

from vetiver.server import predict, vetiver_endpoint

url = ______
endpoint = ______(url)
predict(endpoint = endpoint, data = X_test.head(5))

Compute model metrics and store in pin

from sklearn import metrics

metric_set = [
    metrics.root_mean_squared_error,
    metrics.r2_score,
    metrics.mean_absolute_error,
]
y_predictions = pd.Series(housing_fit.predict(X_test))

housing_metrics = pd.DataFrame()

for metric in metric_set:
    metric_name = str(metric.__name__)
    metric_output = metric(y_test, y_predictions)
    housing_metrics = pd.concat(
        (
            housing_metrics,
            pd.DataFrame({"name": [metric_name], "score": [metric_output]}),
        ),
        axis=0,
    )

housing_metrics.reset_index(inplace=True, drop=True)
housing_metrics

# generate vetiver model
v = VetiverModel(
    housing_fit,
    "tompkins-housing",
    prototype_data = X_train,
    metadata = ______
)

# write new version of pin with metrics metadata
vetiver_pin_write(board, v)

Retrieve model metrics

metadata = board.pin_meta("tompkins-housing")
extracted_metrics = pd.DataFrame(metadata.user.get("user"))
extracted_metrics

What else might you want to store as model metadata? How or when might you use model metadata?

Add response here.

Acknowledgments