Test cases are pre-defined datasets of inputs for prompt templates with the option to add a target and tags for each case. You can use these test cases in the playground to test your prompts.

A dataset consists of three things:

  • Inputs: prompt variable values that are interpolated into the prompt template of your model config at generation time (i.e., they replace the {{ variables }} you define in the prompt template.
  • Target: Represents the expected or intended output of the model.
  • Tags: Any string metadata tags you want to attach to a test case.

Creating Datasets

From uploaded CSV

You can create a dataset by uploading a CSV file to the Datasets tab. Go to Datasets tab and click Upload file to create dataset. In the following modal, provide a name for your dataset and upload your file.

Upload a dataset

Each row from the CSV file represents a test case. The column names represent the prompt template input variable names.

For example, if the prompt template is:

prompt template
Please solve the following math question: {{ question }}

Then the CSV file would need a column named question.

Different delimiter types are supported, including comma, tab, pipe, and semicolon.

The names target and tags are reserved.

If a target column is present, it will be used as the gold standard answer for that row’s output. For example, in the CSV below, the target 4 is the expected answer to the question What is 2 + 2.

CSV
question, target
What is 2 + 2, 4

If a tags column is present, it will be used as metadata tags for a specific row. Tags should be comma-separated with no spaces.

For example, in the CSV below, row one has been tagged as easy and arithmetic, and the second row hard and calculus.

CSV
question, target, tags
What is 2 + 2, 4, easy, arithmetic
Evaluate ∫(1 / x^4 + 1)dx, x - 1/3(x^-3) + C, hard,calculus

Tags are helpful for filtering test cases in the playground. If you have imported 10 cases in the playground and run an evaluation on all cases, then you can filter the cases by tag and see the average score for only the selected cases. This helps you understand whether your prompt performs well on specific test cases.

From the playground

In the playground, after you click Add test case, you can optionally select Upload new dataset to upload a CSV file.

From trace logs

See Observability - Datasets for more details.

Where can I use datasets?