AE 14: Version your housing model

Python

Application exercise
Python
Modified

October 21, 2025

Load the data

import pandas as pd
housing = pd.read_csv('data/tompkins-home-sales.csv')

Build a model

  • Log transform the price variable
  • Split into training/test set
from sklearn import model_selection
import numpy as np
np.random.seed(123)
X, y = housing[["beds", "baths", "area", "year_built"]], np.log10(housing["price"])
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    X, y,
    test_size = 0.2
)

Train a linear regression model:

from sklearn import linear_model
housing_fit = linear_model.LinearRegression().fit(X_train, y_train)

Create a deployable model object

from vetiver import VetiverModel
v = VetiverModel(model = ______, model_name = ______, prototype_data = X_train)
v.description
# create a vetiver model with a custom description

Pin your model

from pins import board_temp
from vetiver import vetiver_pin_write

board <- ______
board |> ______(v)
# retrieve your model metadata
board.pin_meta(______)

Store a new version

Train your model with a new algorithm:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline

# Define feature columns
numeric_features = ["beds", "baths", "area", "year_built"]
categorical_features = ["town"]

# Preprocessing for numeric and categorical data
preprocessor = ColumnTransformer(
  transformers=[
    ("num", "passthrough", numeric_features),
    ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features)
  ]
)

# Create pipeline with preprocessor and linear regression
housing_fit = Pipeline(steps=[
  ("preprocessor", preprocessor),
  ("regressor", linear_model.LinearRegression())
])

# Prepare features and target
X = housing[numeric_features + categorical_features]
y = np.log10(housing["price"])

# Split data
X_train, X_test, y_train, y_test = model_selection.train_test_split(
  X, y,
  test_size=0.2
)

# Fit model
housing_fit.fit(X_train, y_train)

Store this new model as a new version of the same pin:

v = VetiverModel(model = ______, model_name = ______, prototype_data = X_train)
______(board, v)

What versions do you have?

board.pin_versions(______)

Create a new vetiver model

Fit a random forest model

from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Define feature columns
numeric_features = ["beds", "baths", "area", "year_built"]
categorical_features = ["town"]

# Create preprocessing steps
numeric_transformer = SimpleImputer(strategy="mean")
categorical_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("onehot", OneHotEncoder(handle_unknown="ignore"))
])

# Combine preprocessors
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, categorical_features)
    ]
)

# Create pipeline with preprocessor and model
housing_fit = Pipeline(steps=[
    ("preprocessor", preprocessor),
    ("regressor", RandomForestRegressor(n_estimators=200, random_state=123))
])

# Prepare training data with all features
X_train_full = housing.loc[X_train.index, numeric_features + categorical_features]
housing_fit.fit(X_train_full, y_train)

Store your model:

from pins import board_temp
from vetiver import vetiver_pin_write

board = board_temp(versioned = True, allow_pickle_read = True)
v = VetiverModel(housing_fit, "tompkins-housing", prototype_data = X_train)
vetiver_pin_write(board, v)

Create a vetiver REST API

from vetiver import VetiverAPI

api = ______(v)
api.run()
Running FastAPI from a Quarto document

Quarto uses the Jupyter engine to run Python code blocks, which does not support running asynchronous code directly. To run a FastAPI server within a Quarto document, you can use an asynchronous context.

import asyncio
import uvicorn

app = api.app

if __name__ == "__main__":
    config = uvicorn.Config(app)
    server = uvicorn.Server(config)
    await server.serve()

Call your new API endpoints

Run your API in the background

We will write a standalone script to run the API in the background.

from vetiver import write_app
write_app(board = board, pin_name = "tompkins-housing", file = "app.py")

To run the Python script, switch to the Terminal tab and run the Shell command:

uvicorn app:api --port <TODO> --host 127.0.0.1

Replace <TODO> with a random four digit number. This executes the API in the background. Note the URL and port printed in the terminal. You will need this to execute queries against the API.

Return predictions from your model API:

from vetiver.server import predict, vetiver_endpoint

url = ______
endpoint = ______(url)
predict(endpoint = endpoint, data = X_test.head(5))

Optional: try /metadata or /ping here:

import requests

url = ______
print(requests.get(url).content)

Acknowledgments