Prompt Chain

Integrate AI Observability with OpenTelemetry

A runnable OpenTelemetry integration example for promptfoo that demonstrates built-in tracing and observability for prompt evaluation workflows.

Works with opentelemetry

54
Spark score
out of 100
Updated yesterday
Version code-scan-action-0.1
Models

Add to Favorites

Why it matters

Integrate AI observability into your CI/CD pipeline using OpenTelemetry. This asset helps you monitor and debug AI model performance within your existing DevOps workflows.

Outcomes

What it gets done

01

Set up OpenTelemetry for AI model tracing.

02

Monitor AI performance metrics in real-time.

03

Debug AI-related issues within the CI/CD process.

04

Analyze AI model behavior and identify bottlenecks.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-built-in | bash

Capabilities

What this chain does

Deploy / CI

Runs build pipelines, tests, and deploys to environments.

Debug

Traces errors to their root cause and suggests fixes.

Review code

Analyzes code for bugs, style issues, and improvements.

Overview

Built In

What it does

This is a runnable example that demonstrates promptfoo's built-in OpenTelemetry integration. It shows how to enable tracing and observability for prompt evaluation workflows using the native OpenTelemetry support included in promptfoo, allowing you to collect telemetry data from your prompt chains and evaluation runs.

How it connects

Use this example when you want to add observability to your promptfoo evaluations or need a reference implementation for integrating OpenTelemetry tracing into your prompt testing workflows. It's ideal for teams setting up monitoring and performance analysis for their LLM evaluation pipelines.

Source README

integration-opentelemetry/built-in (OpenTelemetry Built-in Tracing)

You can run this example with:

npx promptfoo@latest init --example integration-opentelemetry/built-in
cd integration-opentelemetry/built-in

This example demonstrates promptfoo's built-in OpenTelemetry tracing for LLM provider calls.

Quick Start

  1. Set up environment variables:
# Required for the providers you want to test
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Add other provider keys as needed
  1. Run the evaluation:
npx promptfoo eval -c promptfooconfig.yaml
  1. View traces in the UI:
npx promptfoo view

Navigate to the Traces tab to see detailed span information.

Configuration

Tracing is enabled by default. Configure via environment variables:

Variable Default Description
PROMPTFOO_DISABLE_TRACING false Set to true to disable tracing
OTEL_EXPORTER_OTLP_ENDPOINT - Export traces to external OTLP backend
OTEL_SERVICE_NAME promptfoo Service name in traces

Viewing Traces Externally

With Jaeger

  1. Start Jaeger:
docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest
  1. Run eval with OTLP export:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 npx promptfoo eval
  1. View at http://localhost:16686

With Honeycomb

OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io \
OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY" \
npx promptfoo eval

Trace Attributes

Each LLM call span includes:

GenAI Semantic Conventions

  • gen_ai.system - Provider system (openai, anthropic, etc.)
  • gen_ai.operation.name - Operation type (chat, completion, embedding)
  • gen_ai.request.model - Requested model name
  • gen_ai.request.max_tokens - Max tokens setting
  • gen_ai.request.temperature - Temperature setting
  • gen_ai.usage.input_tokens - Prompt tokens used
  • gen_ai.usage.output_tokens - Completion tokens used
  • gen_ai.usage.total_tokens - Total tokens
  • gen_ai.usage.cached_tokens - Cached tokens (Anthropic)
  • gen_ai.usage.reasoning_tokens - Reasoning tokens (o1 models)
  • gen_ai.response.model - Actual model used
  • gen_ai.response.id - Provider response ID
  • gen_ai.response.finish_reasons - Finish reasons

Promptfoo Attributes

  • promptfoo.provider.id - Provider identifier
  • promptfoo.eval.id - Evaluation run ID
  • promptfoo.test.index - Test case index
  • promptfoo.prompt.label - Prompt label

Supported Providers

All major providers are instrumented:

Provider Tracing Support
OpenAI
Anthropic
Azure OpenAI
AWS Bedrock
Google Vertex AI
Ollama
Mistral
Cohere
Huggingface
IBM Watsonx
HTTP
OpenRouter
Replicate
OpenAI-compatible ✓ (inherited)
Cloudflare AI ✓ (inherited)

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.