We help companies build & improve their AI products with our hands-own
services. Request a consultation
here.
Idea: Identify High Entropy Outputs
The idea is to automatically flag any logs which have high entropy responses. To do that, you can simply rerun your LLM app on that particular sample and measure the difference between the two outputs. The larger their distance, the more likely it is an input for which your LLM app is unreliable. In the sections below we will discuss what distance metrics are useful to measure the difference between two responses. The easiest way to build intuition on this is to think about prompt engineering. How often have you rerun your prompt and noticed that it only works in half of the cases? If that is the case, you are dealing with high entropy (uncertainty) responses as the prompt isn’t reliable for this input.Implementation Using Parea
To see how easy this workflow can be implemented with Parea, let’s take a look at the following example. We have some functionllm_app
which we want to test for high entropy responses.
In our evaluation function is_unreliable_input
, we rerun our function again and compare the new output with the original one.
trace
is that our eval is executed in the background (so no additional latency) and we can choose to only apply it in a fraction of the cases (here 10%).