Welcome to INFO 4940/5940: Applied Machine Learning

Lecture 1

Dr. Benjamin Soltoff

Cornell University
INFO 4940/5940 - Fall 2025

August 26, 2025

Agenda

Agenda

  • Introductions
  • What is machine learning?
  • Software
  • What is INFO 4940?
  • This week’s tasks

Learning objectives

  • Introduce the course staff
  • Define machine learning and how it will be taught in this course
  • Review course policies
  • Discuss hopes and dreams for the course

Staff intros

Meet the instructor

Dr. Benjamin Soltoff

Associate Teaching Professor in Information Science

284 CIS Building

Headshot of Dr. Benjamin Soltoff

Meet the course team

TAs

  • Afran A.
  • Menghan X.
  • Haocheng Z.

Meet each other!

Physically interact with at least 2 people sitting around you. Introduce yourselves to each other and share:

02:00

What is machine learning?

What is machine learning?

What is machine learning?

What is machine learning?

Your turn

What is machine learning to you? How do you seek to use it in this class? Discuss with a neighbor or two.

03:00

The first half of the class

A classic ML workflow

A better way to think about it

A flowchart diagramming the machine learning operations lifecycle, including collecting data, understanding and cleaning data, training and evaluating models, deploying model, and monitoring model.

How is machine learning used in practice?

The second half of the class

Large Language Models (LLMs)

Large language models (LLMs) are a category of foundation models trained on immense amounts of data making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks.

  • Human language (i.e. text)
  • Code
  • Images
  • Audio
  • Video

How LLMs are being used…

How to use LLMs effectively

What will we learn about LLMs?

  • How they work
  • Use them programmatically
  • Use them effectively to solve real-world problems
  • Enhance their capabilities
  • Build things with them
  • Software development

How we will do this

R and Python

A cartoon illustration shows the R and Python programming language logos as cheerful characters. Both characters look happy and friendly against a bright yellow background. Generated using ChatGPT.

Major differences between R and Python

R Python
Syntax Functional language Object-oriented language
Statistical learning Developed by statisticians for statistical analysis Meh
Machine learning {scikit-learn}
Deep learning
Visualization {ggplot2} {matplotlib} + others
Package management CRAN pip/virtualenv/PyPI/Anaconda/uv
Speed Somewhat slower Somewhat faster
Community Academia and industry Larger (general-purpose programming language)

Course philosophy on programming languages

  • R-preferred

    • Code examples will be in R
    • Readings tend to use R
  • Use Python as you wish

  • Feel free to use LLMs to translate between languages*

  • Many packages we use will have both R and Python implementations

Who is this class for?

Palmer

A professional headshot for Palmer with a neutral background. Palmer is biracial (Black and Indian descent) and 24 years old. Created using Microsoft Copilot.

Armando

A professional headshot for Armando with a neutral background. Armando is in his early 20s. Created using Microsoft Copilot.

  • Fourth-year undergraduate student in information science, concentrating in data science
  • Took INFO 3950 (Data Analytics for Information Science) last year
  • Wants to learn how to use machine learning models for production, and combine his theoretical knowledge with practical applications

Chen

A professional headshot for Chen with a neutral background. Chen is of Chinese descent, 22 years old, uses a wheelchair for mobility, and has a confident expression that reflects her strength. Created using Microsoft Copilot.

  • Born and raised in Shenzhen, China
  • Information science major, plans to apply for industry positions
  • Completed a summer internship at Dow Chemical and saw the analytics team was using R for predictive modeling
  • Wants to learn more about machine learning and how to apply it to real-world problems

Course overview

Homepage

https://info4940.infosci.cornell.edu/

  • All course materials
  • Links to Canvas, GitHub, Posit Workbench, etc.
  • Let’s take a tour!

Course toolkit

All linked from the course website:

Important

Make sure you can access Posit Workbench before class on Thursday.

Activities: Prepare, Participate, Practice, Perform

  • Prepare: Introduce new content and prepare for class by completing the readings

  • Participate: Attend and actively participate in class, office hours, and team meetings

  • Practice: Practice applying ML techniques and computing with application exercises during class, graded for completion

  • Perform: Put together what you’ve learned to analyze real-world data

    • Homework assignments x 6(-ish)
    • Two projects

Activities: Participate

Preparing for and participating in class

Not preparing for class, not actively participating

Cadence

  • Application exercises: Complete by the end of class
  • HWs: Posted Friday morning, due following Wednesday 11:59pm
  • Projects: Deadlines throughout the semester, with some class time dedicated to working on them, and most work done outside of class

Grading

Category Percentage
Projects 60%
Homework 30%
Application Exercises 10%

See course syllabus for how the final letter grade will be determined.

15 minute rule

;document.getElementById("tweet-85772").innerHTML = tweet["html"];

Support

  • Attend office hours
  • Ask and answer questions on the discussion forum
  • Use Ezra for generative AI assistance with course content
  • Reserve email for questions on personal matters and/or grades
  • Read the course support page

Diversity + inclusion

  • I want you to feel like you belong in this class and are respected
  • We are committed to full inclusion in education for all persons
  • If you feel that we have failed these goals, please either let us know or report it, and we will address the issue

Accessibility

I want this course to be accessible to students with all abilities. Please feel free to let me know if there are circumstances affecting your ability to participate in class.

Course policies

Late work, waivers, regrades policy

  • We have policies!
  • Read about them on the course syllabus and refer back to them when you need it

Collaboration policy

  • Only work that is clearly assigned as team work should be completed collaboratively.

  • Homeworks must be completed individually. You may not directly share answers / code with others, however you are welcome to discuss the problems in general and ask for advice.

Sharing / reusing code policy

  • We are aware that a huge volume of code is available on the web, and many tasks may have solutions posted

  • Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source

  • All code must be written by you, the human being

Generative AI

Academic integrity

  1. A student shall in no way misrepresent his or her work.
  2. A student shall in no way fraudulently or unfairly advance his or her academic position.
  3. A student shall refuse to be a party to another student’s failure to maintain academic integrity.
  4. A student shall not in any other manner violate the principle of academic integrity.

Most importantly!

Ask if you’re not sure if something violates a policy!

Application exercise

Application exercise

  • What do you hope to learn from this class?
  • Based on the syllabus and current list of topics, what are you most excited about doing in this course?
  • What are you most concerned about doing in this course?
  • What do you think is currently missing from the class that should be added (e.g. topics, assignments, techniques)? Are there certain things you want reduced and/or eliminated to make additional space for other topics?

Discuss with your peers, then submit your individual responses.

08:00

Wrap-up

Before Thursday