Lecture 1
Cornell University
INFO 4940/5940 - Fall 2025
August 26, 2025
Dr. Benjamin Soltoff
Associate Teaching Professor in Information Science
284 CIS Building
Physically interact with at least 2 people sitting around you. Introduce yourselves to each other and share:
02:00
Image credit: xkcd
Illustration credit: https://vas3k.com/blog/machine_learning/
Illustration credit: https://vas3k.com/blog/machine_learning/
What is machine learning to you? How do you seek to use it in this class? Discuss with a neighbor or two.
03:00
Illustration credit: workshops.tidymodels.org
Illustration credit: Posit
Large language models (LLMs) are a category of foundation models trained on immense amounts of data making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks.
Source: IBM
Generated by ChatGPT
R | Python | |
---|---|---|
Syntax | Functional language | Object-oriented language |
Statistical learning | Developed by statisticians for statistical analysis | Meh |
Machine learning |
|
{scikit-learn} |
Deep learning |
|
|
Visualization | {ggplot2} | {matplotlib} + others |
Package management | CRAN | pip/virtualenv/PyPI/Anaconda/uv |
Speed | Somewhat slower | Somewhat faster |
Community | Academia and industry | Larger (general-purpose programming language) |
Generated using Microsoft Copilot
Generated using Microsoft Copilot
Generated using Microsoft Copilot
https://info4940.infosci.cornell.edu/
All linked from the course website:
GitHub organization: github.coecis.cornell.edu/info4940-fa25
Positron
Use the Workbench: posit-workbench.infosci.cornell.edu
🤖 Ezra
Communication: GitHub Discussions
Assignment submission and feedback: Gradescope
Important
Make sure you can access Posit Workbench before class on Thursday.
Prepare: Introduce new content and prepare for class by completing the readings
Participate: Attend and actively participate in class, office hours, and team meetings
Practice: Practice applying ML techniques and computing with application exercises during class, graded for completion
Perform: Put together what you’ve learned to analyze real-world data
Category | Percentage |
---|---|
Projects | 60% |
Homework | 30% |
Application Exercises | 10% |
See course syllabus for how the final letter grade will be determined.
I want this course to be accessible to students with all abilities. Please feel free to let me know if there are circumstances affecting your ability to participate in class.
Only work that is clearly assigned as team work should be completed collaboratively.
Homeworks must be completed individually. You may not directly share answers / code with others, however you are welcome to discuss the problems in general and ask for advice.
We are aware that a huge volume of code is available on the web, and many tasks may have solutions posted
Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source
All code must be written by you, the human being
Use generative AI to facilitate, rather than hinder, learning
✅ GAI tools for reference purposes
🤔 GAI tools for writing my code/analysis
❌ GAI tools for narrative
You are ultimately responsible for the work you turn in; it should reflect your understanding of the course content
Source: Code of Academic Integrity
Ask if you’re not sure if something violates a policy!
Discuss with your peers, then submit your individual responses.
08:00