HW 05 - Document and deploy models

Homework
Modified

October 23, 2025

Important

This homework is due October 29 at 11:59pm ET.

Learning objectives

  • Train and evaluate a machine learning model
  • Document a machine learning model using a model card
  • Publish a machine learning model as an Application Programming Interface (API)
  • Deploy an API using reproducible Docker containers

Getting started

  • Go to the info4940-fa25 organization on GitHub. Click on the repo with the prefix hw-04. It contains the starter documents you need to complete the lab.

  • Clone the repo and start a new workspace in Positron. See the Homework 0 instructions for details on cloning a repo and starting a new R project.

General guidance

TipGuidelines + tips
  • Set your random seed to ensure reproducible results.
  • Use caching to speed up the rendering process.
  • Use parallel processing to speed up rendering time. Note that this works differently on different systems and operating systems, and it also makes it harder to debug code and track progress in model fitting. Use at your own discretion.

Remember that continuing to develop a sound workflow for reproducible data analysis is important as you complete this homework and other assignments in this course. There will be periodic reminders in this assignment to remind you to render, commit, and push your changes to GitHub. You should have at least 3 commits with meaningful commit messages by the end of the assignment.

TipWorkflow + formatting

Make sure to

  • Update author name on your document.
  • Label all code chunks informatively and concisely.
  • Follow consistent code style guidelines.
  • Make at least 3 commits.
  • Resize figures where needed, avoid tiny or huge plots.
  • Turn in an organized, well formatted document.
WarningPresenting results of multiple models

For the love of all that is pure in this world, please consider how to present the results of your modeling efforts. Do not just rely on raw output from R to tell us what we need to know.

  • Your plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.
  • Maybe condense information into one or a handful of custom graphs.
  • You can create simple formatted tables using {gt}/great_tables

Attitudes towards marijuana legalization (redux)

Recall in homework 03 that you trained and evaluated a machine learning model to predict whether an individual believes marijuana should be legal or not legal using the General Social Survey (GSS) dataset. For this assignment, you will take the next steps to document and deploy your model.

Exercise 0 - Train a model to predict marijuana attitudes

Train a machine learning model to predict whether an individual believes marijuana should be legal or not legal. The data has been partitioned into three distinct sets:

  • gss-train.feather - the training set
  • gss-val.feather - the validation set
  • gss-test.feather - the test set

Use the training set to train and compare models and the validation set to evaluate your final model’s performance. You do not have access to the test set. The course staff will evaluate your model’s performance using the test set.

Use best practices as taught throughout the semester. The end result should be a {vetiver} model object pinned to a board that you can use to make predictions and deploy the model. Feel free to use the shared Google Cloud Storage board we used for previous application exercises, or create your own.

NoteWe are not evaluating your code/model training

Note that you are not submitting the code for training and evaluating your model as part of this homework assignment. You need to estimate a model in order to complete the assignment, but it is not the focus of the assignment. We are evaluating your ability to document and deploy a model, not your ability to train a model.

Feel free to reuse your final model from homework 03 if you wish, or train something new. Use whatever organizational system makes sense to you for training your models (e.g. standalone R scripts, Quarto documents).

Exercise 1 - Document your model

Document your model using the provided model card template. We have provided a model card template based on Model Cards for Model Reporting paper and the Vetiver-provided template. Complete the model card with the required information from your model. You are encouraged to provide thorough documentation and go beyond the minimum requirements if you wish to earn full credit.

TipGenerate a parameterized report

Quarto supports parameterized reports, a mechanism to create different variations of a document programmatically. For this assignment we leverage parameterization to automatically populate some content in the model card.

The syntax for specifying parameters differs whether you are using the {knitr} (R) engine or jupyter (Python).

You should modify the YAML header

params:
  board: !expr library(googleCloudStorageR); pins::board_gcs(bucket = "info-4940-models", prefix = "bcs88/")
  name: grass-null-model
  version: 20241125T162318Z-c0109

to match the board location, pin name, and pin version of your model. The provided params are for a null model trained by the instructor.

You should modify the code in the first cells which reference the specific board and model:

# second cell
board = pins.board_gcs(path = "info-4940-models/bcs88/", allow_pickle_read = True)

# third cell
v = vetiver.VetiverModel.from_pin(board, "grass-null-model-py", version = "20251022T155309Z-1e2bb")

to match the board location, pin name, and pin version of your model. The provided values are for a null model trained by the instructor.

Exercise 2 - Deploy the model

Deploy your model as an API using Vetiver. It should be deployed to our course-provided web server at http://appliedml.infosci.cornell.edu. You should build and test your API locally using Docker before deploying it to the server.

This tutorial provides detailed instructions on accessing the course-provided web server, cloning your repository, and building/running your Docker image on the server.

You have been given a specific port number to which your API must be deployed. If it is not hosted on that port number, we will not be able to access it and you will fail to earn credit for this portion of the assignment.

WarningRun your Docker image in detached mode or it will end without warning

When you run your Docker image, make sure to run it in detached mode (-d) so that it runs in the background. If you do not run it in detached mode, it will run in the foreground and will end without warning when you close your terminal or disconnect from the server. For example,

docker run -p 2001:2001 -d grass-null

To prevent mistakes, once you have run your Docker image, close your Terminal session and then test your API from within Positron. Your API should continue to run in the background. It will be located at http://appliedml.infosci.cornell.edu:YOUR_PORT_NUMBER.

If you cannot generate predictions from the API, then it is likely we cannot generate predictions using it either.

For this exercise, you be evaluated on whether or not we can successfully use your API to generate predictions for the test set, as well as on the performance of your model on the test set.

TipHow you will earn credit for this exercise

Credit will be earned based based on two criteria:

  1. Successful deployment: Your API must be successfully deployed to the correct port number. We will evaluate this by sending a request to your API and checking that it returns a valid response.
  2. Model performance: Your model’s test set balanced accuracy relative to the null model. The higher the increase in balanced accuracy, the higher you will score on this exercise.

Generative AI (GAI) self-reflection

As stated in the syllabus, include a written reflection for this assignment of how you used GAI tools (e.g. what tools you used, how you used them to assist you with writing code), what skills you believe you acquired, and how you believe you demonstrated mastery of the learning objectives.

Render, commit, and push one last time.

Make sure that you commit and push all changed documents and your Git pane is completely empty before proceeding.

Wrap up

Submission

  • Go to http://www.gradescope.com and click Log in in the top right corner.
  • Click School Credentials \(\rightarrow\) Cornell University NetID and log in using your NetID credentials.
  • Click on your INFO 4940/5940 course.
  • Click on the assignment, and you’ll be prompted to submit it.
  • Mark all the pages associated with exercise. All the pages of your homework should be associated with at least one question (i.e., should be “checked”).

Grading

  • Exercise 1: 25 points
  • Exercise 2: 25 points
  • GAI self-reflection: 0 points
  • Total: 50 points