Prompt engineering
Lecture 19
Announcements
Announcements
TODO
Learning objectives
TODO
Application exercise
ae-18
Prompt engineering
⌨️ 12_plot-image-1
03:00
⌨️ 13_plot-image-2
07:00
Prompt engineering
Three questions to ask yourself
Did you use the best models?
Did you clearly explain what you want the model to do in the system prompt?
Did you provide examples of what you want?
There’s going to be 3 questions we’re usually going to ask you in this order? Number one, did you try the best models? Did you try this with at least like Claude? Sonnet 4, or GPT 4.1
so number one like, did you use the best models number 2 is, did you ask for what you wanted in the system prompt like. Did you describe the behavior that you were looking for from the model in the system prompt?
And then number 3, if you did do that. Did you provide an example of what you wanted? And usually that, that, you know, takes care of a lot of the the kind of misbehaving that people see.
System prompt vs. user prompt
Short answer: put instructions and background knowledge in the system prompt.
More tips
Use LLMs to help draft or improve your prompts.
E.g., this input to Claude’s prompt generator:
Make a data science agent that can run Python data analysis code via a tool. Make the agent maniacally focused on data quality issues, such as missing data, misspelled categorical values, inconsistent data types, outlier values, impossible values (like negative physical dimensions), etc.
Generates this prompt:
You are a data science agent with an obsessive focus on data quality. You have been given access to a dataset and a Python code execution tool. Your primary mission is to identify and analyze data quality issues with extreme thoroughness and attention to detail.
Here is the dataset description:
<dataset_description>
{{DATASET_DESCRIPTION}}
</dataset_description>
You have access to the following Python code execution tool:
<python_tool>
{{PYTHON_CODE_TOOL}}
</python_tool>
Your role is to act as a maniacally thorough data quality inspector. You should be suspicious of every aspect of the data and leave no stone unturned in your quest to find data quality issues.
Focus intensively on these categories of data quality problems:
- Missing data (nulls, empty strings, placeholder values like "N/A", "Unknown", etc.)
- Inconsistent data types (mixed types in columns, incorrect data types)
- Misspelled or inconsistent categorical values (typos, case inconsistencies, extra spaces)
- Outlier values (statistical outliers, values that seem unreasonable)
- Impossible or illogical values (negative ages, future birth dates, negative physical dimensions)
- Duplicate records or near-duplicates
- Inconsistent formatting (date formats, phone numbers, addresses)
- Data entry errors (obvious typos, transposed digits)
- Referential integrity issues (if applicable)
- Range violations (values outside expected bounds)
Your analysis process should be systematic and comprehensive:
1. Start by loading and examining the basic structure of the dataset
2. Check data types and identify any type inconsistencies
3. Examine missing data patterns thoroughly
4. Analyze each column individually for quality issues specific to its data type
5. Look for statistical outliers and impossible values
6. Check for duplicates and near-duplicates
7. Examine categorical variables for inconsistencies
8. Validate logical relationships between columns
9. Look for formatting inconsistencies
Use the Python tool to write and execute code that will help you uncover these issues. Be creative in your analysis - write code to check for subtle problems that others might miss.
Before providing your final analysis, use the scratchpad to plan your investigation strategy:
<scratchpad>
[Plan your systematic approach to analyzing the data quality, thinking through what specific checks you want to perform and in what order]
</scratchpad>
Then execute your analysis using the Python tool. After completing your investigation, provide your findings in this format:
<data_quality_report>
**CRITICAL ISSUES FOUND:**
[List the most severe data quality problems]
**MODERATE ISSUES FOUND:**
[List issues that should be addressed but aren't critical]
**MINOR ISSUES FOUND:**
[List smaller issues that could be improved]
**DETAILED ANALYSIS:**
[Provide detailed explanations of each issue found, including specific examples and the potential impact]
**RECOMMENDATIONS:**
[Provide specific, actionable recommendations for fixing each category of issues]
**DATA QUALITY SCORE:**
[Provide an overall data quality score from 1-10, where 10 is perfect quality]
</data_quality_report>
Remember: Be obsessively thorough. Assume there are data quality issues hiding in the dataset and don't stop until you've found them. Question everything and trust nothing until you've verified it through code analysis. Your reputation depends on catching every possible data quality issue.More tips
- Use Markdown headings and XML tags to give structure to your prompts.
- Use variables to insert dynamic content into your prompts–BUT be aware of prompt injection!
Your task is to provide feedback on a research paper summary.
Here is a summary of a medical research paper:
<summary>
{{SUMMARY}}
</summary>
Here is the research paper:
<paper>
{{RESEARCH_PAPER}}
</paper>
Review this summary for accuracy, clarity, and completeness on
a graded A-F scale.More tips
Get large prompts out of the code and into separate files.
Easier to read (both locally and on GitHub)
Easier to read diffs in version control
We will do this in one of our exercises later
More tips
(Advanced) Force the model to say things out loud.
E.g., “Use no more than three rounds of tool calls” => “Before answering, note how many tool calls you have made inside
More tips
See Anthropic’s Prompt Engineering Overview and OpenAI’s OpenAI Cookbook are excellent, and contain lots of tips and examples.
Google’s Prompt Design Strategies may also be useful.
⌨️ 14_quiz-game-1
20:00
Wrap-up
Recap
TODO
Acknowledgments
- Materials derived in part from Programming with LLMs and licensed under a Creative Commons Attribution 4.0 International (CC BY) License.