import pandas as pd
housing = pd.read_csv('data/tompkins-home-sales.csv')AE 14: Version your housing model
Python
Load the data
Build a model
- Log transform the price variable
- Split into training/test set
from sklearn import model_selection
import numpy as np
np.random.seed(123)
X, y = housing[["beds", "baths", "area", "year_built"]], np.log10(housing["price"])
X_train, X_test, y_train, y_test = model_selection.train_test_split(
X, y,
test_size = 0.2
)Train a linear regression model:
from sklearn import linear_model
housing_fit = linear_model.LinearRegression().fit(X_train, y_train)Create a deployable model object
from vetiver import VetiverModel
v = VetiverModel(model = ______, model_name = ______, prototype_data = X_train)
v.description# create a vetiver model with a custom descriptionPin your model
from pins import board_temp
from vetiver import vetiver_pin_write
board <- ______
board |> ______(v)# retrieve your model metadata
board.pin_meta(______)Store a new version
Train your model with a new algorithm:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
# Define feature columns
numeric_features = ["beds", "baths", "area", "year_built"]
categorical_features = ["town"]
# Preprocessing for numeric and categorical data
preprocessor = ColumnTransformer(
transformers=[
("num", "passthrough", numeric_features),
("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features)
]
)
# Create pipeline with preprocessor and linear regression
housing_fit = Pipeline(steps=[
("preprocessor", preprocessor),
("regressor", linear_model.LinearRegression())
])
# Prepare features and target
X = housing[numeric_features + categorical_features]
y = np.log10(housing["price"])
# Split data
X_train, X_test, y_train, y_test = model_selection.train_test_split(
X, y,
test_size=0.2
)
# Fit model
housing_fit.fit(X_train, y_train)Store this new model as a new version of the same pin:
v = VetiverModel(model = ______, model_name = ______, prototype_data = X_train)
______(board, v)What versions do you have?
board.pin_versions(______)Create a new vetiver model
Fit a random forest model
from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
# Define feature columns
numeric_features = ["beds", "baths", "area", "year_built"]
categorical_features = ["town"]
# Create preprocessing steps
numeric_transformer = SimpleImputer(strategy="mean")
categorical_transformer = Pipeline(steps=[
("imputer", SimpleImputer(strategy="most_frequent")),
("onehot", OneHotEncoder(handle_unknown="ignore"))
])
# Combine preprocessors
preprocessor = ColumnTransformer(
transformers=[
("num", numeric_transformer, numeric_features),
("cat", categorical_transformer, categorical_features)
]
)
# Create pipeline with preprocessor and model
housing_fit = Pipeline(steps=[
("preprocessor", preprocessor),
("regressor", RandomForestRegressor(n_estimators=200, random_state=123))
])
# Prepare training data with all features
X_train_full = housing.loc[X_train.index, numeric_features + categorical_features]
housing_fit.fit(X_train_full, y_train)Store your model:
from pins import board_temp
from vetiver import vetiver_pin_write
board = board_temp(versioned = True, allow_pickle_read = True)
v = VetiverModel(housing_fit, "tompkins-housing", prototype_data = X_train)
vetiver_pin_write(board, v)Create a vetiver REST API
from vetiver import VetiverAPI
api = ______(v)
api.run()Quarto uses the Jupyter engine to run Python code blocks, which does not support running asynchronous code directly. To run a FastAPI server within a Quarto document, you can use an asynchronous context.
import asyncio
import uvicorn
app = api.app
if __name__ == "__main__":
config = uvicorn.Config(app)
server = uvicorn.Server(config)
await server.serve()Call your new API endpoints
We will write a standalone script to run the API in the background.
from vetiver import write_app
write_app(board = board, pin_name = "tompkins-housing", file = "app.py")To run the Python script, switch to the Terminal tab and run the Shell command:
uvicorn app:api --port <TODO> --host 127.0.0.1Replace <TODO> with a random four digit number. This executes the API in the background. Note the URL and port printed in the terminal. You will need this to execute queries against the API.
Return predictions from your model API:
from vetiver.server import predict, vetiver_endpoint
url = ______
endpoint = ______(url)
predict(endpoint = endpoint, data = X_test.head(5))Optional: try /metadata or /ping here:
import requests
url = ______
print(requests.get(url).content)Acknowledgments
- Materials derived in part from Intro to MLOps with {vetiver} and licensed under a Creative Commons Attribution 4.0 International (CC BY) License.