Practices to improve LLM apps component-wise
Pseudocode of the sample app
trace
decorator for instrumentation and evaluation of any step.
This decorator logs any inputs, output, latency, etc., creates traces (hierarchical logs), executes any specified evaluation functions to score the output and saves their scores.
To report the quality of an app, we will run experiments.
Experiments measure the performance of our app on a dataset and enable identifying regressions across experiments.
Below you can see how to use Parea to instrument & evaluate every component.
Sample visualization of logged trace
Generate synthetic data using instructor
Log
object and returns a score.
We will use the Log
object to access the output
of that step and the target
from our dataset.
The target
is a stringified dictionary containing the correctly expanded query, context, and answer.
Sample cache implementation
trace
decorator, you can create nested tracing of steps and apply functions to score their outputs.
After instrumenting your application, you can track the quality of your AI app and identify regressions across runs using experiments.
Finally, Parea can act as a cache for your LLM calls via its LLM gateway.