Wrap-up: Where to go from here

Lecture 23

Dr. Benjamin Soltoff

Cornell University
INFO 4940/5940 - Fall 2024

December 5, 2024

End-of-semester logistics

Remaining assignments

  • Homework 05
  • Homework 06/extra credit
  • Group project

Build a simple machine learning stack

A digital cartoon with two illustrations: the top shows the R-logo with a scary face, and a small scared little fuzzy monster holding up a white flag in surrender while under a dark storm cloud. The text above says "at first I was like..." The lower cartoon is a friendly, smiling R-logo jumping up to give a happy fuzzy monster a high-five under a smiling sun and next to colorful flowers. The text above the bottom illustration reads "but now itโ€™s like..."

RStudio Workbench

  • Access to RStudio Workbench and Google Cloud Storage will end at some point after December 20th
  • All INFO 4940/5940 materials remain available in your repos on GitHub as long as you are an active student
  • Any other work you have done on the server will not be accessible after the end of the semester
  • Where will you go from here?

Software installation

Programming language

Reproducibility

To renv or not to renv

Benefits

  • Isolated
  • Portable
  • Reproducible

Drawbacks

  • Itโ€™s a pain to configure for every project
  • Some packages have issues installing via {renv}

Install some core R packages

# install the major packages from the course published on CRAN
install.packages(c(
  "tidyverse", "tidymodels", "devtools", "usethis",
  "styler", "keras3", "ranger", "bonsai"
))

# install a package hosted on GitHub
remotes::install_github(repo = "cis-ds/rcis")

Create a GitHub account

Configure Git

usethis::use_git_config(
  user.name = "Your name", 
  user.email = "Email associated with your GitHub account"
  )

Painless authentication with PAT

Personal Access Token

Setup PAT authentication

Create PAT

usethis::create_github_token(
  scopes = c("repo", "user", "gist", "workflow"),
  description = "<DESCRIBE YOUR DEVICE>"
)

Store PAT

gitcreds::gitcreds_set()

#> ? Enter password or token: ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
#> -> Adding new credentials...
#> -> Removing credentials from cache...
#> -> Done.

Create PAT

usethis::create_github_token(
  scopes = c("repo", "user", "gist", "workflow"),
  description = "<DESCRIBE YOUR DEVICE>",
  host = "https://github.coecis.cornell.edu/"
)

Store PAT

gitcreds::gitcreds_set(url = "https://github.coecis.cornell.edu/")

#> ? Enter password or token: ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
#> -> Adding new credentials...
#> -> Removing credentials from cache...
#> -> Done.

What have you learned?

Learning objectives for INFO 4940/5940

  • Train and evaluate machine learning models using a variety of algorithms.
  • Collect and wrangle data for machine learning.
  • Deploy machine learning models in a production environment.
  • Communicate results of machine learning analyses to a non-technical audience.
  • Implement reproducible machine learning workflows using version control and literate programming.

Where to go from here

Courses

Foundational (much more theory and math)

  • CS 3780: Machine Learning for Intelligent Systems
  • ECE 3200: Fundamentals of Machine Learning
  • ORIE 3741: Learning with Big Messy Data
  • STSCI 3740: Data Mining and Machine Learning
  • INFO 3950: Data Analytics for Information Science

Domain applications

  • INFO 3350/6350: Text Mining History and Literature
  • INFO 3370/5371: Studying Social Inequality Using Data Science
  • INFO 4100/5101: Learning Analytics
  • INFO 4300: Language and Information
  • INFO 4940: Advanced NLP for Humanities Research
  • INFO 4940/6940: How LLMs work, their potential and limitations

Find a community

Two fuzzy monsters standing side-by-side outside of a door frame through which is a magical wonderland of different R communities, with a "mind blown" rainbow coming out of the one closest to the door. A welcome mat says "Welcome."

Online communities

Keep your skills fresh