Integrate AI Observability with OpenTelemetry
A runnable OpenTelemetry integration example for promptfoo that demonstrates built-in tracing and observability for prompt evaluation workflows.
Why it matters
Integrate AI observability into your CI/CD pipeline using OpenTelemetry. This asset helps you monitor and debug AI model performance within your existing DevOps workflows.
Outcomes
What it gets done
Set up OpenTelemetry for AI model tracing.
Monitor AI performance metrics in real-time.
Debug AI-related issues within the CI/CD process.
Analyze AI model behavior and identify bottlenecks.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/pfoo-built-in | bash Capabilities
What this chain does
Runs build pipelines, tests, and deploys to environments.
Traces errors to their root cause and suggests fixes.
Analyzes code for bugs, style issues, and improvements.
Overview
Built In
What it does
This is a runnable example that demonstrates promptfoo's built-in OpenTelemetry integration. It shows how to enable tracing and observability for prompt evaluation workflows using the native OpenTelemetry support included in promptfoo, allowing you to collect telemetry data from your prompt chains and evaluation runs.
How it connects
Use this example when you want to add observability to your promptfoo evaluations or need a reference implementation for integrating OpenTelemetry tracing into your prompt testing workflows. It's ideal for teams setting up monitoring and performance analysis for their LLM evaluation pipelines.
Source README
integration-opentelemetry/built-in (OpenTelemetry Built-in Tracing)
You can run this example with:
npx promptfoo@latest init --example integration-opentelemetry/built-in
cd integration-opentelemetry/built-in
This example demonstrates promptfoo's built-in OpenTelemetry tracing for LLM provider calls.
Quick Start
- Set up environment variables:
# Required for the providers you want to test
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Add other provider keys as needed
- Run the evaluation:
npx promptfoo eval -c promptfooconfig.yaml
- View traces in the UI:
npx promptfoo view
Navigate to the Traces tab to see detailed span information.
Configuration
Tracing is enabled by default. Configure via environment variables:
| Variable | Default | Description |
|---|---|---|
PROMPTFOO_DISABLE_TRACING |
false |
Set to true to disable tracing |
OTEL_EXPORTER_OTLP_ENDPOINT |
- | Export traces to external OTLP backend |
OTEL_SERVICE_NAME |
promptfoo |
Service name in traces |
Viewing Traces Externally
With Jaeger
- Start Jaeger:
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 16686:16686 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
- Run eval with OTLP export:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 npx promptfoo eval
- View at http://localhost:16686
With Honeycomb
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io \
OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY" \
npx promptfoo eval
Trace Attributes
Each LLM call span includes:
GenAI Semantic Conventions
gen_ai.system- Provider system (openai, anthropic, etc.)gen_ai.operation.name- Operation type (chat, completion, embedding)gen_ai.request.model- Requested model namegen_ai.request.max_tokens- Max tokens settinggen_ai.request.temperature- Temperature settinggen_ai.usage.input_tokens- Prompt tokens usedgen_ai.usage.output_tokens- Completion tokens usedgen_ai.usage.total_tokens- Total tokensgen_ai.usage.cached_tokens- Cached tokens (Anthropic)gen_ai.usage.reasoning_tokens- Reasoning tokens (o1 models)gen_ai.response.model- Actual model usedgen_ai.response.id- Provider response IDgen_ai.response.finish_reasons- Finish reasons
Promptfoo Attributes
promptfoo.provider.id- Provider identifierpromptfoo.eval.id- Evaluation run IDpromptfoo.test.index- Test case indexpromptfoo.prompt.label- Prompt label
Supported Providers
All major providers are instrumented:
| Provider | Tracing Support |
|---|---|
| OpenAI | ✓ |
| Anthropic | ✓ |
| Azure OpenAI | ✓ |
| AWS Bedrock | ✓ |
| Google Vertex AI | ✓ |
| Ollama | ✓ |
| Mistral | ✓ |
| Cohere | ✓ |
| Huggingface | ✓ |
| IBM Watsonx | ✓ |
| HTTP | ✓ |
| OpenRouter | ✓ |
| Replicate | ✓ |
| OpenAI-compatible | ✓ (inherited) |
| Cloudflare AI | ✓ (inherited) |
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.