DSPy is a framework for automatically prompting and fine-tuning language models. It provides:
Composable and declarative APIs that allow developers to describe the architecture of their LLM application in the form of a “module” (inspired by PyTorch’s nn.Module),
Optimizers formerly known as “teleprompters” that optimize a user-defined module for a particular task. The optimization could involve selecting few-shot examples, generating prompts, or fine-tuning language models.
Suppress Logging During Optimization / Compilation
Note, that optimization/compilation with DSPy can create a lot of logs which aren’t necessarily actionable.
You can suppress these logs by using the TurnOffPareaLogging context manager.
After optimization, you will likely want to assess the performance of your module as outlined below.
Copy
from parea.helpers import TurnOffPareaLoggingteleprompter = ...with TurnOffPareaLogging(): # turn of logging during optimization compiled_model = teleprompter.compile(...)
The DSPy integration automatically creates nested traces by relying on Python’s contextvars.
That means, if you are using threading or multi-processing in your DSPy application, the traces of the DSPy modules in that get orphaned from the main trace.
There is an existing issue in Python’s standard library and a great explanation in the FastAPI repo that discusses this limitation.
To avoid this, you need to manually copy over the context to the new thread/process via contextvars.copy_context().
See the below example:
Experiment/Optimization Tracking: Evaluate DSPy Modules on a Dataset
You can evaluate & track the performance of your DSPy modules by running experiments.
To evaluate DSPy modules, you need to attach the evaluation metrics to the module (attach_evals_to_module) and convert the DSPy examples to dictionaries (convert_dspy_examples_to_parea_dicts).
Copy
from parea.utils.trace_integrations.dspy import attach_evals_to_module, convert_dspy_examples_to_parea_dictsmy_dspy_module_instance = ...eval_metrics = [ ... ]dspy_test_set = ...p.experiment( "experiment_name", # name of the experiment convert_dspy_examples_to_parea_dicts(dspy_test_set), # dataset of the experiment attach_evals_to_module(my_dspy_module_instance, eval_metrics), # function which should be evaluated).run()
Online Evaluation: Evaluate DSPy Modules during Inference
If you have evaluation functions which don’t require reference/target answers, you can evaluate your DSPy modules by attaching those evals to the module via attach_evals_to_module.
This will automatically apply your list of evals to the module whenever you call it. See the example below for more details.
Copy
import osimport dspyfrom dotenv import load_dotenvfrom parea import Pareafrom parea.utils.trace_integrations.dspy import attach_evals_to_moduleload_dotenv()# instrument DSPy calls with Pareap = Parea(api_key=os.getenv("PAREA_API_KEY"))p.trace_dspy()# configure DSPY to use GPT-3.5-turbogpt3_turbo = dspy.OpenAI(model="gpt-3.5-turbo-1106", max_tokens=300)dspy.configure(lm=gpt3_turbo)# Define a simple signature for basic question answeringclass GenerateAnswer(dspy.Signature): """Answer questions with short factoid answers.""" question = dspy.InputField() answer = dspy.OutputField(desc="often between 1 and 5 words")# a simple dspy.Module that generates answers to questions using Chain-of-Thoughtclass AnswerModule(dspy.Module): def __init__(self): super().__init__() self.generate_answer = dspy.ChainOfThought(GenerateAnswer) def forward(self, question): prediction = self.generate_answer(question=question) return dspy.Prediction(answer=prediction.answer)# an eval function that counts the number of words in the answerdef num_words(example, pred, trace=None): return len(pred.answer.split())# attach the eval function to the modulegenerate_answer = attach_evals_to_module(AnswerModule(), [num_words])pred = generate_answer(question="What is the color of the sky?")print(f'answer: {pred.answer}')