SGLang from LMSYS is “a Structured Generation Language designed for LLMs”. Its main benefits is that it allows to structure complex LLM programs with multiple chained generation calls,control flow, multiple modalities, parallelism, and external interaction using plain Python. Additionally, one can take improve performance of local LLMs with its RadixAttention mechanism for automatic KV cache reuse across multiple calls.

Quickstart

python
pip install parea-ai "sglang[openai]"

First, create a Parea API key as shown here. Second, call integrate_with_sglang() on the Parea client to automatically instrument any OpenAI calls made through SGLang. Finally, define a function like run_and_trace to automatically log the outputs of the SGLang program to Parea and create a trace which associates all LLM calls together. The following code snippet demonstrates a simple multi-turn question-answering program that logs the outputs of the LLM calls to Parea.

from parea.schemas import UpdateTraceScenario
from parea.utils.trace_utils import fill_trace_data, get_current_trace_id

from sglang import function, system, user, assistant, gen, set_default_backend, OpenAI, SglFunction
from parea import Parea, trace

from sglang.lang.interpreter import ProgramState


p = Parea(api_key="PAREA_API_KEY")  # Replace with your Parea API key
p.integrate_with_sglang()


@function
def multi_turn_question(s, question_1, question_2):
    s += system("You are a helpful assistant.")
    s += user(question_1)
    s += assistant(gen("answer_1", max_tokens=256))
    s += user(question_2)
    s += assistant(gen("answer_2", max_tokens=256))


@trace(log_omit_outputs=True)
def run_and_trace(func: SglFunction, *args, **kwargs) -> ProgramState:
    state: ProgramState = func.run(*args, **kwargs)
    while not state.stream_executor.is_finished:
        time.sleep(1)
    # the returned state doesn't return the output
    # but the variables of the executor have it such that we can log them
    fill_trace_data(get_current_trace_id(), {'result': state.stream_executor.variables}, UpdateTraceScenario.RESULT)
    return state

set_default_backend(OpenAI("gpt-3.5-turbo"))

run_and_trace(multi_turn_question, question_1="What is the capital of Sweden?", question_2="List two local attractions.")

Visualization of the Trace

This will produce the following trace: