You can use evaluation functions in the playground by clicking the Evaluation metrics button in a prompt session. Here, you will have the option to select an existing metric or create a new one.

Registering an auto-evaluation metric

Parea provides use-case-specific evaluation metrics that you can use out of the box. To get started, click Register new auto-eval metric. This will allow you to create a metric based on your specific inputs. Next, find the metric you want to use based on your use case. Each metric has its required and optional variables.

Your prompt template must have a variable for any required inputs. For example, the LLM Grader metric expects your prompt to have a {{question}} variable. If your variable is named something else, you can select which variable to associate with the question field from the drop-down menu. Click Register once you are done, and that metric will be enabled.

Example - Auto-eval from New Prompt Session

It’s super easy to get started. Let’s create an auto-eval in the playground. First, go to the Playground and click Create New Session. You will see a rag example prompt pre-populated. The prompt is:

prompt template
Use the following pieces of context from Nike's financial 10k filings
dataset to answer the question. Do not make up an answer if no context is
provided to help answer it.
Context:
---------
{{context}}
---------
Question: {{question}}
---------
Answer:

The inputs row shows that context has been pre-populated with a snippet from Nike’s 10k filings. Our question to ask the LLM is: Which operating segment contributed least to total Nike brand revenue in fiscal 2023?

Click Compare to see what the LLM’s response is.

Add an auto-eval metric

Now, let’s add an auto-eval metric. Click Evaluation metrics and Register new auto-eval metric. Select RAG as our use case, and let’s start with Context Relevance as our metric. Click Setup.

context_relevance

You will notice that this metric requires a question and context input. Since our prompt template already has these inputs, we can click Register. Now, this metric will always be available. To finish, click Set eval metric(s) to enable this metric in our current Playground session. The Compare button will now say Compare & evaluate; click it.

First, a new LLM result will be generated. Then, the session will automatically save your results. Then, the evaluation score will be computed. You will see your score at the top of the Prompt section and the Inference section.

My score was Context Relevance-b6CK' score: 0.08 what was yours?

Auto-eval with target (“ground truth”)

What if we know what the correct answer should be? Add a target variable to our prompt to represent the correct answer.

In the Input section, click the blue button to Add inputs to test collection.

add_inputs

Next, enter the name Rag Example for our new collection. And where it says Define a target paste:

target
Global Brand Divisions

Finally, click Create collection.

create_dataset

Now, let’s register a new auto-eval metric. This time, select General as our use case and Answer Matches Target - LLM Judge as our metric. Once again, no changes are needed since our prompt template input variable names match the required inputs of the metric, click Register, then Set eval metric(s).

We now have two metrics attached to this session, Context Relevance and Answer Matches Target - LLM Judge.

Instead of clicking Compare & evaluate, since we do not need to call the LLM provider again, select the down chevron icon next to Compare & evaluate and select Evaluate. This will run the evaluation metrics on our existing LLM response.

Congrats, that’s it!

You should now see two scores. My scores are 'Context Relevance-b6CK' score: 0.08 / 'Answer Matches Target - LLM Judge-rFun' score: 0.00. Did your prompt also fail the Answer Matches Target eval?

CHALLENGE: Update your prompt to get it to pass. 🤓

Want to explore more? Try our Rag Tutorial.

Using a custom eval metric

All the evaluation metrics you’ve created in the Evaluations tab will be available in the Playground. If you don’t have any, click Create new custom metric to get started (Learn how to create a custom eval).

You can select the metrics you want in the Evaluation metrics modal and then click Set eval metric(s).