How many samples are necessary to achieve good performance with DSPy?
nn.Module
),gpt-3.5-turbo
as our LLM of choice for this tutorial.
dspy.Example
objects and mark the question
field as the input field. Then, we can split the data into a training and test set.
context
and question
, and outputs an answer
. The signature provides:
dspy.Module
and overriding the forward
method. Here, we use ChromaDB to retrieve the top-k passages from the context and then use the Chain-of-Thought generate the final answer.
trace_dspy
method.nest_asyncio
module.
dspy.evaluate.answer_exact_match
: checks if the predicted answer is an exact match with the target answer.gold_passages_retrieved
: checks if the retrieved context matches the golden context.dspy.Example
s into a list of dictionaries and also attach the evaluation metric to the module such that we can execute the experiment with Parea. We can do the former via convert_dspy_examples_to_parea_dicts
and the latter via attach_evals_to_module
.
gold_passages_retrieved
) and the overall accuracy of our RAG pipeline (answer_exact_match
), we can see our retrieval step is the bottleneck (e.g. both metrics agree in 90% of cases).
Signature
: given some context and a question, generate a new query to find more relevant information.
forward
pass:
self.max_hops
times to fetch diverse contexts. In each iteration:
self.generate_query[hop]
).self.generate_answer
generates an answer via CoT.ChromadbRM
outside of the module declaration to ensure that the module is pickleable, which is a requirement to optimize it later on.
BootstrapFewShot
optimizer which uses few-shot examples to boost the performance of the prompts. To evaluate the pipeline we will apply the following logic: