AE 17: Programming with LLMs

Application exercise
R
Python
Modified

October 30, 2025

07-models

library(ellmer)

# Step 1: List available models for OpenAI and Anthropic
# List models using the `models_*()` functions.
# Hint: try using the Positron data viewer by calling `View()` on the results.
models_____
models_____

prompt <- "Write a recipe for an easy weeknight dinner my kids would like."

# Step 2: Compare responses from different models
# Try sending the same prompt to different models to compare the responses.
chat("openai/____")$chat(prompt)
chat("anthropic/____")$chat(prompt)

# Bonus: Local models?
# If you have local models installed, you can use them too.
models_ollama()
chat("ollama/____")$chat(prompt)

# Bonus: Repeat your OpenAI and Anthropic requests using direct provider
# functions.
chat_____(____)$chat(prompt)
# %%
import dotenv
from chatlas import ChatAuto

dotenv.load_dotenv()

# %% [markdown]
# List models by calling the `list_models` method on a `Chat` instance.

# %%
ChatAuto(____).list_models()

# %% [markdown]
# You can also load the models into a Polars DataFrame for easier viewing.
# Use this block to list OpenAI and Anthropic models.

# %%
import polars as pl

models = ChatAuto("____").list_models()
models = pl.DataFrame(models)
models

# %% [markdown]
# Now try sending the same prompt to different models to compare the responses.

# %%
prompt = "Write a recipe for an easy weeknight dinner my kids would like."

ChatAuto("____").chat(prompt)
ChatAuto("____").chat(prompt)

# %% [markdown]
# Bonus: local models?
#
# If you have local models installed, try them out with Ollama. Note that you
# have to give a model name to list models, but the model name can be anything.

# %%
ChatAuto("ollama/any-model-name").list_models()
ChatAuto("ollama/gemma3:4b").chat(prompt)

# %% [markdown]
# Bonus: Rewrite your `ChatAuto()` calls to use the direct provider functions.

# %%
from chatlas import ChatAnthropic, ChatOpenAI

Chat____(____)
Chat____(____)

08-vision

library(ellmer)

recipe_images <- here::here("data/recipes/images")
img_pancakes <- file.path(recipe_images, "EasyBasicPancakes.jpg")
img_pad_thai <- file.path(recipe_images, "PadThai.jpg")

#' Ask OpenAI's `gpt-4.1-nano` to give a creative recipe title and description
#' for the pancakes image.
chat <- ____
chat$chat(
  "____",
  ____(img_pancakes)
)

#' In a new chat, ask it to write a recipe for the food it sees in the Pad Thai
#' image. (Don't tell it that it's Pad Thai!)
chat <- ____
chat$chat(
  "Write a recipe to make the food in this image.",
  ____(img_pad_thai)
)
# %%
import chatlas
import dotenv
from pyhere import here

dotenv.load_dotenv()

# %%
recipe_images = here("data/recipes/images/")
img_ziti = recipe_images / "ClassicBakedZiti.jpg"
img_mac_cheese = recipe_images / "CreamyCrockpotMacAndCheese.jpg"

# %% [markdown]
# Ask OpenAI's `gpt-4.1-nano` to give a creative recipe title and description
# for the ziti image.
chat = ____
chat.chat(
    "Give the food in this image a creative recipe title and description.",
    chatlas.____(img_ziti),
)

# %% [markdown]
# In a new chat, ask it to write a recipe for the food it sees in the Mac &
# Cheese image. (Don't tell it that it's Mac & Cheese!)
chat = ____
chat.chat(
    "Write a recipe to make the food in this image.", chatlas.____(img_mac_cheese)
)

09-pdf

library(ellmer)

recipe_pdfs <- here::here("data/recipes/pdf")
pdf_waffles <- file.path(recipe_pdfs, "CinnamonPeachOatWaffles.pdf")

# Ask OpenAI's `gpt-4.1-nano` to turn this messy PDF print-out of a waffle
# recipe into a clean list of ingredients and steps to follow.
chat <- ____
chat$chat(
  "____",
  ____(pdf_waffles)
)
# %%
import chatlas
import dotenv
from pyhere import here

dotenv.load_dotenv()

# %%
recipe_pdfs = here("data/recipes/pdf/")
pdf_cheesesteak = recipe_pdfs / "PhillyCheesesteak.pdf"

# %% [markdown]
# Ask OpenAI's `gpt-4.1-nano` to turn this messy PDF print-out of a Philly
# Cheesesteak recipe into a clean list of ingredients and steps to follow.

# %%
chat = chatlas.____
chat.chat(
    "____",
    chatlas.____(pdf_cheesesteak),
)

10-structured-output

library(ellmer)

# Read in the recipes from text files
recipe_txt <- here::here("data/recipes/text")
txt_waffles <- recipe_txt |>
  file.path("CinnamonPeachOatWaffles.md") |>
  brio::read_file() # Like readLines() but all in one string

# Show the first 500 characters of the first recipe
txt_waffles |> substring(1, 500) |> cat()

#' Here's an example of the structured output we want to achieve for a single
#' recipe:
#'
#' {
#'   "title": "Spicy Mango Salsa Chicken",
#'   "description": "A flavorful and vibrant chicken dish...",
#'   "ingredients": [
#'     {
#'       "name": "Chicken Breast",
#'       "quantity": "4",
#'       "unit": "medium",
#'       "notes": "Boneless, skinless"
#'     },
#'     {
#'       "name": "Lime Juice",
#'       "quantity": "2",
#'       "unit": "tablespoons",
#'       "notes": "Fresh"
#'     }
#'   ],
#'   "instructions": [
#'     "Preheat grill to medium-high heat.",
#'     "In a bowl, combine ...",
#'     "Season chicken breasts with salt and pepper.",
#'     "Grill chicken breasts for 6-8 minutes per side, or until cooked through.",
#'     "Serve chicken topped with the spicy mango salsa."
#'   ]
#' }
#'
#' Hint: You can use `required = FALSE` in `type_*()` functions to indicate that
#' a field is optional.

type_recipe <- type_____(
  title = ____(),
  description = ____(),
  ingredients = ____(
    type_object(
      name = ____(),
      quantity = ____(),
      unit = ____(),
      notes = ____()
    )
  ),
  instructions = type_array(____())
)

chat <- chat("openai/gpt-4.1-nano")

chat$chat_structured(txt_waffles, type = type_recipe)
# %%
import chatlas
import dotenv

dotenv.load_dotenv()

# %%
from pyhere import here

recipe_txt = here("data/recipes/text/")
txt_cheesecake = (recipe_txt / "PhillyCheesesteak.md").read_text()

# %%
print(txt_cheesecake)

# %% [markdown]
# Here's an example of the structured output we want to achieve for a single
# recipe:
#
# ```json
# {
#   "title": "Spicy Mango Salsa Chicken",
#   "description": "A flavorful and vibrant chicken dish...",
#   "ingredients": [
#     {
#       "name": "Chicken Breast",
#       "quantity": "4",
#       "unit": "medium",
#       "notes": "Boneless, skinless"
#     },
#     {
#       "name": "Lime Juice",
#       "quantity": "2",
#       "unit": "tablespoons",
#       "notes": "Fresh"
#     }
#   ],
#   "instructions": [
#     "Preheat grill to medium-high heat.",
#     "In a bowl, combine ...",
#     "Season chicken breasts with salt and pepper.",
#     "Grill chicken breasts for 6-8 minutes per side, or until cooked through.",
#     "Serve chicken topped with the spicy mango salsa."
#   ]
# }
# ```
#
# Hint: you can use `Optional` from the `typing` module to indicate that a field
# is not always required.

# %%
from typing import List, Optional

from pydantic import BaseModel, Field


class Ingredient(BaseModel):
    name: ____
    quantity: ____
    unit: Optional[____] = Field(None, description="____")
    notes: Optional[____] = ____


class Recipe(BaseModel):
    title: ____
    description: ____
    ingredients: List[_____]
    instructions: List[____] = Field(..., description="____")


# %%
chat = chatlas.ChatOpenAI(model="gpt-4.1-nano")
recipe = chat.chat_structured(str(txt_cheesecake), data_model=Recipe)

# %% [markdown]
# `.chat_structured()` returns an instance of the provided Pydantic model, so
# you can access fields directly:

# %%
recipe.title

# %% [markdown]
# Or you can convert it to JSON with pydantic's built-in `.model_dump_json()`
# method:

# %%
print(recipe.model_dump_json(indent=2))

11_batch

library(ellmer)

# Read in the recipes from text files (this time all of the recipes)
recipe_files <- fs::dir_ls(here::here("data/recipes/text"))
recipes <- purrr::map(recipe_files, brio::read_file)

# Use the type_recipe we defined in `10_structured-output`. Optionally replace
# the `type_recipe` definition below with your own version if you want to.
type_recipe <- type_object(
  title = type_string(),
  description = type_string(),
  ingredients = type_array(
    type_object(
      name = type_string(),
      quantity = type_number(),
      unit = type_string(required = FALSE),
      notes = type_string(required = FALSE)
    )
  ),
  instructions = type_array(type_string())
)

# Parallel structured extraction (fast, may be pricey) -------------------------
# First, we'll use a simple loop to process each recipe one at a time. This is
# straightforward for our 8 recipes, but would be slow (and expensive) for a
# larger dataset.
recipes_data <- ____(
  chat("openai/gpt-4.1-nano"),
  prompts = ____,
  type = ____
)

# Hey, it's a table of recipes!
recipes_tbl <- dplyr::as_tibble(recipes_data)
recipes_tbl

# Batch API (slower, but cheaper) ----------------------------------------------
# That was pretty easy! But what if we had 10,000 recipes to process? That would
# take a long time, and be pretty expensive. We can save money by using the
# Batch API, which allows us to send multiple requests in a single API call.
#
# With the Batch API, results are processed asynchronously and are completed at
# some point, usually within a few minutes but at most within the next 24 hours.
# Because batching lets providers schedule requests more efficiently, it also
# costs less per token than the standard API.

res <- ____(
  chat("anthropic/claude-3-haiku-20240307"),
  prompts = ____,
  type = ____,
  path = here::here("data/recipes/batch_results_r_claude.json")
)

# Save the results -------------------------------------------------------------
# Now, save the results to a JSON file in `data/recipes/recipes.json`. Once
# you've done that, you can open up `11_recipe-app.py` and run the app to see
# your new recipe collection!
jsonlite::write_json(
  res,
  here::here("data/recipes/recipes.json"),
  auto_unbox = TRUE,
  pretty = TRUE
)
# %%
import chatlas
import dotenv

dotenv.load_dotenv()

# %%
# Read in the recipes from the text files (this time all of the files)
from pyhere import here

recipe_files = list(here("data/recipes/text").glob("*"))
recipes = [f.read_text() for f in recipe_files]

# %% [markdown]
# We'll use the same Pydantic models we defined in `10_structured-output`.
# Optional: Replace the models in the next cell with your own from that
# exercise.

# %%
from typing import List, Optional

from pydantic import BaseModel, Field


class Ingredient(BaseModel):
    name: str = Field(..., description="Name of the ingredient")
    quantity: float | None = Field(default=1, description="Quantity as provided")
    unit: Optional[str] = Field(
        None,
        description="Unit of measure, if applicable",
    )
    notes: Optional[str] = Field(
        None,
        description="Additional notes or preparation details",
    )


class Recipe(BaseModel):
    title: str
    description: str
    image_url: Optional[str] = Field(..., description="URL of an image of the dish")
    ingredients: List[Ingredient]
    instructions: List[str] = Field(..., description="Step-by-step instructions")


# %% [markdown]
# First, we'll use a simple loop to process each recipe one at a time. This is
# straightforward for our 8 recipes, but would be slow (and expensive) for a
# larger dataset.
#
# In a future version of `chatlas`, you will be able to use
# `chatlas.parallel_chat_structured()` to do this truly in parallel!

# %%
from tqdm import tqdm


def extract_recipe(recipe_text: str) -> Recipe:
    chat = chatlas.ChatOpenAI(model="gpt-4.1-nano")
    return chat.chat_structured(recipe_text, data_model=Recipe)


recipes_data: List[Recipe] = []
for recipe in tqdm(recipes):
    recipes_data.append(extract_recipe(recipe))

# %%
[r.title for r in recipes_data]

# %%
# Can that be a polars DataFrame?
import polars as pl

recipes_df = pl.DataFrame([r.model_dump() for r in recipes_data], strict=False)
recipes_df

# %% [markdown]
# That was pretty easy! But what if we had 10,000 recipes to process? That would
# take a long time, and be pretty expensive. We can save money by using the
# Batch API, which allows us to send multiple requests in a single API call.
#
# With the Batch API, results are processed asynchronously and are completed at
# some point, usually within a few minutes but at most within the next 24 hours.
# Because batching lets providers schedule requests more efficiently, it also
# costs less per token than the standard API.

# %%
from chatlas import batch_chat_structured

chat = chatlas.ChatAnthropic(model="claude-3-haiku-20240307")
res = ____(
    chat=chat,
    prompts=____,
    data_model=____,
    path=here("data/recipes/batch_results_py_claude.json"),
)

# %% [markdown]
# Now, save the results to a JSON file in `data/recipes/recipes.json`. Once
# you've done that, you can open up `11_recipe-app.py` and run the app to see
# your new recipe collection!

# %%
import json

recipes_structured = [r.model_dump() for r in res]

json.dump(recipes_structured, open(here("data/recipes/recipes.json"), "w"), indent=2)

Acknowledgments