DSPy
Instrumenting your DSPy application with Parea AI
DSPy is a framework for automatically prompting and fine-tuning language models. It provides:
- Composable and declarative APIs that allow developers to describe the architecture of their LLM application in the form of a “module” (inspired by PyTorch’s
nn.Module
), - Optimizers formerly known as “teleprompters” that optimize a user-defined module for a particular task. The optimization could involve selecting few-shot examples, generating prompts, or fine-tuning language models.
Instrumenting DSPy Modules
To observe your DSPy application, you can use trace_dspy
to trace the execution of your DSPy modules.
from parea import Parea
p = Parea(api_key=os.getenv("PAREA_API_KEY"))
p.trace_dspy()
This will create traces like the one below:
Suppress Logging During Optimization / Compilation
Note, that optimization/compilation with DSPy can create a lot of logs which aren’t necessarily actionable.
You can suppress these logs by using the TurnOffPareaLogging
context manager.
After optimization, you will likely want to assess the performance of your module as outlined below.
from parea.helpers import TurnOffPareaLogging
teleprompter = ...
with TurnOffPareaLogging(): # turn of logging during optimization
compiled_model = teleprompter.compile(...)
Limitations
Threading & Multi-processing
The DSPy integration automatically creates nested traces by relying on Python’s contextvars
.
That means, if you are using threading or multi-processing in your DSPy application, the traces of the DSPy modules in that get orphaned from the main trace.
There is an existing issue in Python’s standard library and a great explanation in the FastAPI repo that discusses this limitation.
To avoid this, you need to manually copy over the context to the new thread/process via contextvars.copy_context()
.
See the below example:
Experiment/Optimization Tracking: Evaluate DSPy Modules on a Dataset
You can evaluate & track the performance of your DSPy modules by running experiments.
To evaluate DSPy modules, you need to attach the evaluation metrics to the module (attach_evals_to_module
) and convert the DSPy examples to dictionaries (convert_dspy_examples_to_parea_dicts
).
from parea.utils.trace_integrations.dspy import attach_evals_to_module, convert_dspy_examples_to_parea_dicts
my_dspy_module_instance = ...
eval_metrics = [ ... ]
dspy_test_set = ...
p.experiment(
"experiment_name", # name of the experiment
convert_dspy_examples_to_parea_dicts(dspy_test_set), # dataset of the experiment
attach_evals_to_module(my_dspy_module_instance, eval_metrics), # function which should be evaluated
).run()
Online Evaluation: Evaluate DSPy Modules during Inference
If you have evaluation functions which don’t require reference/target answers, you can evaluate your DSPy modules by attaching those evals to the module via attach_evals_to_module
.
This will automatically apply your list of evals to the module whenever you call it. See the example below for more details.
Next Steps
Checkout our tutorial on how to improve a DSPy RAG application with Parea AI, here. Or read more about how experiments work to assess variance of LLMs by using multiple trials, make experiments reproducible, and more.
Was this page helpful?