Making sure that your LLM powered applications works well in expected (and unexpected) scenarios is important. For that, we have the concept of a test case collection which is a dataset of test cases. A test case consists of a set of inputs, an optional target/gold standard answer, and optional tags. The inputs are key value pairs which represent the input to your prompt, chain or agent. The target is the expected answer given the input. The tags allow you to filter test cases at different places on the platform. A test case, except for its tags, is immutable after it has been created.

You can use the test case collections in the Lab, Prompt IDE or benchmarking job to test your prompts, chains & agents.

Defining test case collections

Upload a CSV file

You can create a test case collection by uploading a CSV file in the test hub. The rows of the CSV file are the test cases which will be imported. If the target column is present it will be used as the gold standard answer. Similarly, if the tags column is present, it will as tags and is expected to be a comma-separated list (e.g. tag1, tag2). The remaining columns will be used to define the inputs of your test cases.

From experiments & production traffic logs

As you experiment in the Lab or in the Prompt IDE, you can use the Add to test case collection button to add the current inputs to a test case collection. You can also define test cases from your production traffic logs.