Lecture 15
Cornell University
INFO 4940/5940 - Fall 2025
October 21, 2025
ae-14Instructions
ae-14 (repo name will be suffixed with your GitHub name).renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.Vetiver, the oil of tranquility, is used as a stabilizing ingredient in perfumery to preserve more volatile fragrances.
you can operationalize that model!
you likely should be the one to operationalize that model!
pricebeds, baths, area, and year_built are numeric predictorstown and municipality could be nominal predictorssold_date could be a date predictor| sold_date | price | beds | baths | area | lot_size | year_built | hoa_month | town | municipality | long | lat |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2022-08-16 | 335500 | 3 | 2.0 | 1957 | 4.50000000 | 1880 | NA | Ulysses | Unincorporated | -76.67680 | 42.53255 |
| 2022-11-14 | 331500 | 3 | 2.0 | 1416 | 0.58999082 | 1930 | NA | Lansing | Unincorporated | -76.50347 | 42.53340 |
| 2022-03-31 | 302385 | 3 | 1.5 | 1476 | 0.20000000 | 1900 | NA | Ithaca | Ithaca city | -76.50439 | 42.44250 |
| 2022-09-28 | 285000 | 3 | 2.0 | 1728 | 0.46999541 | 2002 | NA | Dryden | Dryden village | -76.29495 | 42.48415 |
| 2022-07-22 | 350000 | 4 | 1.0 | 1698 | 0.12396694 | 1925 | NA | Ithaca | Ithaca city | -76.50146 | 42.43264 |
| 2023-11-28 | 225000 | 2 | 1.5 | 1047 | 0.08000459 | 1939 | NA | Ithaca | Ithaca city | -76.50576 | 42.43373 |
| 2023-09-13 | 285000 | 3 | 2.0 | 2311 | 1.26999541 | 1965 | NA | Caroline | Unincorporated | -76.33375 | 42.39048 |
| 2023-06-23 | 145000 | 2 | 2.0 | 1215 | 0.03999082 | 1990 | NA | Danby | Unincorporated | -76.49228 | 42.38340 |
| 2023-11-27 | 90900 | 5 | 3.0 | 2238 | 0.38000459 | 1880 | NA | Groton | Groton village | -76.36311 | 42.58533 |
| 2022-11-09 | 467500 | 6 | 4.0 | 2304 | 0.13000459 | 2017 | NA | Ithaca | Ithaca city | -76.50205 | 42.43136 |
Or your model of choice!
Instructions
Split your data in training and testing.
Fit a model to your training data.
03:00
ββ tompkins-housing β <bundled_workflow> model for deployment
A lm regression modeling workflow using 4 features
Instructions
Create your vetiver model object.
Check out the default description that is created, and try out using a custom description.
Show your custom description to your neighbor.
05:00
Data, models, R/Python objects, etc.
β Email
β GitHub
π«€ Shared network drive
π«€ Dropbox, Google Drive, Box.com, etc.
β
Amazon S3
β
Azure
β
Google Cloud
β
Microsoft 365
The pins package publishes data, models, and other R and Python objects, making it easy to share them across projects and with your colleagues.
Creating new version '20251022T155624Z-64a74'
Writing to pin 'tompkins-housing'
Create a Model Card for your published model
β’ Model Cards provide a framework for transparent, responsible reporting
β’ Use the vetiver `.Rmd` template as a place to start
Model Cards provide a framework for transparent, responsible reporting.
Use the vetiver `.qmd` Quarto template as a place to start,
with vetiver.model_card()
Writing pin:
Name: 'tompkins-housing'
Version: 20251022T115624Z-1ee78
Instructions
Pin your vetiver model object to a temporary board.
Retrieve the model metadata with pin_meta().
05:00
rf_rec <- recipe(
price ~ beds + baths + area + year_built + town,
data = housing_train
) |>
step_impute_mean(all_numeric_predictors()) |>
step_impute_mode(all_nominal_predictors())
housing_fit <- workflow() |>
add_recipe(rf_rec) |>
add_model(rand_forest(trees = 200, mode = "regression")) |>
fit(data = housing_train)from sklearn.ensemble import RandomForestRegressor
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
# Define feature columns
numeric_features = ["beds", "baths", "area", "year_built"]
categorical_features = ["town"]
# Create preprocessing steps
numeric_transformer = SimpleImputer(strategy="mean")
categorical_transformer = Pipeline(steps=[
("imputer", SimpleImputer(strategy="most_frequent")),
("onehot", OneHotEncoder(handle_unknown="ignore"))
])
# Combine preprocessors
preprocessor = ColumnTransformer(
transformers=[
("num", numeric_transformer, numeric_features),
("cat", categorical_transformer, categorical_features)
]
)
# Create pipeline with preprocessor and model
housing_fit = Pipeline(steps=[
("preprocessor", preprocessor),
("regressor", RandomForestRegressor(n_estimators=200, random_state=123))
])
# Prepare training data with all features
X_train_full = housing.loc[X_train.index, numeric_features + categorical_features]
housing_fit.fit(X_train_full, y_train)Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('num', SimpleImputer(),
['beds', 'baths', 'area',
'year_built']),
('cat',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='most_frequent')),
('onehot',
OneHotEncoder(handle_unknown='ignore'))]),
['town'])])),
('regressor',
RandomForestRegressor(n_estimators=200, random_state=123))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. | steps | [('preprocessor', ...), ('regressor', ...)] | |
| transform_input | None | |
| memory | None | |
| verbose | False |
| transformers | [('num', ...), ('cat', ...)] | |
| remainder | 'drop' | |
| sparse_threshold | 0.3 | |
| n_jobs | None | |
| transformer_weights | None | |
| verbose | False | |
| verbose_feature_names_out | True | |
| force_int_remainder_cols | 'deprecated' |
['beds', 'baths', 'area', 'year_built']
| missing_values | nan | |
| strategy | 'mean' | |
| fill_value | None | |
| copy | True | |
| add_indicator | False | |
| keep_empty_features | False |
['town']
| missing_values | nan | |
| strategy | 'most_frequent' | |
| fill_value | None | |
| copy | True | |
| add_indicator | False | |
| keep_empty_features | False |
| categories | 'auto' | |
| drop | None | |
| sparse_output | True | |
| dtype | <class 'numpy.float64'> | |
| handle_unknown | 'ignore' | |
| min_frequency | None | |
| max_categories | None | |
| feature_name_combiner | 'concat' |
| n_estimators | 200 | |
| criterion | 'squared_error' | |
| max_depth | None | |
| min_samples_split | 2 | |
| min_samples_leaf | 1 | |
| min_weight_fraction_leaf | 0.0 | |
| max_features | 1.0 | |
| max_leaf_nodes | None | |
| min_impurity_decrease | 0.0 | |
| bootstrap | True | |
| oob_score | False | |
| n_jobs | None | |
| random_state | 123 | |
| verbose | 0 | |
| warm_start | False | |
| ccp_alpha | 0.0 | |
| max_samples | None | |
| monotonic_cst | None |
Instructions
Create a new vetiver model object using your linear regression model that is explicitly versioned = TRUE and pin to your temporary board.
Then train a random forest model and create a new vetiver model object that is also versioned = TRUE with the same name.
Write this new version of your model to the same pin, and see what versions you have with pin_versions().
05:00
REST API
An interface that can connect applications in a standard way
RESTful queries
Instructions
Create a vetiver API for your model and run it locally.
Explore the visual documentation.
How many endpoints are there?
Discuss what you notice with your neighbor.
05:00
Image credit: Isabel Zimmerman
Image credit: Isabel Zimmerman
requests or {httr2}/predict endpointAny tool that can make an HTTP request can be used to interact with your model API!
You can treat your model API much like it is a local model in memory!
Instructions
Create a vetiver endpoint object for your API.
Predict with your endpoint for new data.
Optional: call another endpoint like /ping or /metadata.
05:00