Leverage user feedback to run A/B tests of prompts, models & other approaches
Original email generation code
wrap_openai_client
/patchOpenai
to automatically trace any LLM calls made by the OpenAI client, and trace
to capture the inputs and outputs of the generate_email
function.long-vs-short-emails
, we will randomly choose to generate a long email (variant_0
, control group) or a short email (variant_1
, treatment group).
Then, we will tag the trace with the A/B test name and the chosen variant via trace_insert
.
Finally, we will return the email, the trace_id, and the chosen variant.
We need to return the latter two in order to associate any feedback with the corresponding variant.
update_log
function of parea_logger
to update the trace with the collected feedback as a score.
ab_test_name
being long-vs-short-emails
.
variant_1
(short emails) performs a lot better than variant_0
(long emails)!
Checkout the full code below for why this variant is performing better.
Note, despite the clearly higher score, never forget to LOOK AT YOUR LOGS to understand what’s happening!
target
in the update_log
function:
target
field.
After reviewing it, you can add it to a dataset by clicking on the Add to dataset
button or pressing Cmd + D
.
Full Code