Tutorial on improving a Langchain RAG application using Parea’s Evals, Tracing, and Playground.
all-MiniLM-L6-v2
sentence transformer from HuggingFace to embed text chunks into vectorsgpt-3.5-turbo-16k
to generate answers to user queriesredis-stack-server.
Then, install the dependencies with poetry install.
Which operating segment contributed least to total Nike brand revenue in fiscal 2023?
.
The PDF document shows that the correct answer should be Global Brand Divisions,
which contributed the least to total brand revenue, with $58M in F2023.
To run our chain, we can use the CLI command to execute the chain with the default question above:
Converse,
which is not correct. Notice that we also fail our matches target eval with a score of 0.0
.
Converse
comes right after the subtotal TOTAL NIKE BRAND
followed by a trailing dollar sign ($
)Add to test collection
button in the top right.
Later, we can use this test case to iterate on our prompt in the playground.
The Add to test collection modal is very flexible; it pulls in the inputs, output, and tags from our selected trace and allows us to edit the information as needed.
RunnableParallel
trace, then click Add to test collection.
This trace is helpful because it has both our input question and the retrieved context.input
to question
and add a new k/v pair for the context,
using the original output value.Global Brand Divisions.
+
to create a new test collection by providing a name and then submitting.
create function eval.
We’ll only select the match target eval for demo purposes. Under the General Evaluation Metrics
section, select Answer Matches Target - LLM Judge.
No changes are needed because we named our input field question.
in the test collection setup, so we can click create metric
and then proceed to the Playground.
Open in Lab
on the ChatOpenAI
trace, which includes the LLM messages.
{{}}
) for template variables question
and context,
and select the gpt-3.5-turbo-16k
model.Add test case
and import our created test case.Evaluation metrics
and select the new eval we created.Compare,
we will see the same response as in our IDE.
Chain of Thought
prompt: Think step by step.
We can add this as our initial user message.
1.0
.
🎉Congratulations, it works!🎉 Now, we can copy this prompt back into our application and continue building.