Starting with a pre-built eval function

You can start with a pre-built evaluation function and customize it for your use case. From the accordion, you can select metrics from three use cases:

  • General
  • Factuality/Summaries
  • RAG (Retrieval-Augmented Generation)
1

Select a metric

Select the metric you want to get started with. A check mark will appear on the selected metric. Next, click Choose pre-built metric.

2

Customize the metric

The first thing you want to do is provide a unique name for the evaluation metric.

All evaluation metrics are Python functions. The main function returning the evaluation score must be named eval_fun and have the signature def eval_fun(log: Log) -> float:

Evaluation functions can call external libraries, Parea SDK, or any built-in Python libraries.

Simple Modification

In the simplest form, you may want to customize a pre-built evaluation metric to use the prompt template variable names you are accustomed to using.

For example, you will see something like the below for most pre-built metrics:

question_field = 'question' # <-- REPLACE "question" WITH YOUR INPUT VARIABLE NAME FOR THE QUESTION

If your prompt template was:

Using the provided data, answer the user's query: {{query}}

You would want to update the evaluation metric to question_field = "query".

Now, when the evaluation metric gets the data for your prompt using log.inputs['query'] it will access the value of your query variable.

(Learn more about the “Log” object)

Advanced Modification

You may want to make a more meaningful change to a metric. For example, the Context Ranking - Listwise RAG metric uses Normalized discounted cumulative gain (NDCG) to measure ranking quality. But maybe you want to use Precision instead. You could then modify the code accordingly.