Case study in ML: Property assessment in Cook County

Lecture 2

Dr. Benjamin Soltoff

Cornell University
INFO 4940/5940 - Fall 2024

August 29, 2024

Announcements

Announcements

Results of student questionnaire

What do you hope to learn from this class?

  1. Application of machine learning
  2. Understanding different models
  3. Practical skills development
  4. Industry standards and best practices
  5. End-to-end machine learning pipeline

What are you most excited about doing in this course?

  1. Deep learning
  2. Model deployment and APIs
  3. Dashboarding and model interpretation
  4. Hands-on exercises
  5. Learning new tools
  6. Curiosity about new topics

What do you think is currently missing?

  1. A/B testing
  2. APIs
  3. Fine-tuning and Large Language Models (LLMs)
  4. High-level overview of the fundamental theories of ML algorithms
  5. Unique data domains
  6. No major changes needed

Property assessment in the United States

Tax collections in the United States

Determining property tax rates

Assessment

Tax Rate

Tax Levy

Tax Bill $$$

Determining property assessments

Who decides?

How do they decide?

🤷‍♀️

  • Typically for residential properties, use a market approach
  • What would the property sell for on the open market?
  • Approaches to generating these predictions vary widely

Regressive tax system

  • Property tax system only works if properties are assessed accurately
  • Inaccurate assessments can lead to regressive tax systems

Sales ratio in Tompkins County, NY

Property assessment in Cook County

  • Historically regressive and corrupt
  • Residential properties overvalued
  • Commercial properties undervalued
  • Shifts the tax burden onto homeowners (often lower-income)

Joseph Barrios, Cook County Assessor (2010-18)

Sales ratio over time in Cook County, IL

Predictive modeling for property assessment

100 day initiatives and objectives

Extremely brief recap of the model

What would the sale price of every Cook County home be if it had sold last year?

  • Estimates the sale price of unsold properties using known sale price of similar and nearby properties
  • Incorporates both property characteristics and geographic/environmental/market trend variables
  • Employs a LightGBM model to generate predictions

Initiating a machine learning project

Stakeholders

Role Responsibilities
Project sponsor Represents the business interests; champions the project
Client Represents end users’ interests; domain expert
Data scientist Sets and executes analytic strategy; communicates with sponsor and client
Impacted users Those impacted by the project; whom decisions are made about

⏱️ Your turn

Who is the sponsor for the Cook County Assessor’s Office project? Who is the client? Impacted users? What might be their competing goals/interests?

Note

Record your answers on the provided worksheet. We will collect them at the end of class for your application exercise credit.

08:00

Choices made

  • Model selection
  • Features used
  • Data sources
  • Selection of training/assessment data

⏱️ Your turn

Pick a choice that the office made in developing their assessment model. What other choices could they have made? What are the trade-offs of this choice? What might be the implications of this choice?

08:00

Evaluating potential bias in a model

  • Reporting bias
  • Historical bias
  • Selection bias
    • Coverage bias
    • Non-response bias
    • Sampling bias
  • Confirmation bias

⏱️ Your turn

  • Reporting bias
  • Historical bias
  • Selection bias
    • Coverage bias
    • Non-response bias
    • Sampling bias
  • Confirmation bias

Identify one type of bias that you think would impact the property assessment model. How might you evaluate this bias? What steps could you take to mitigate it?

08:00

Transparency and accountability: A study in contrasts

Cook County, IL

  • Publishes entire model on GitHub
  • Includes all data files + code to reproduce
  • Describes in detail key model choices

Tompkins County, NY

⏱️ Your turn

Who benefits from transparency in the Cook County model? If Tompkins County were to adopt a similar level of transparency, what might be the benefits and drawbacks? What issues would still remain?

08:00

Wrap-up

Recap

  • Property assessments are a critical part of the tax system in the United States
  • The Cook County Assessor’s Office has made strides in improving their assessment model
  • Transparency and accountability are key to building trust in the model

Summer camp

My three children sitting along the bank of Cayuga Lake after a long afternoon of swimming and outdoor fun.