Prompt Chain

Integrate AI Observability with OpenTelemetry

Name: Integrate AI Observability with OpenTelemetry
Availability: OnlineOnly
Author: Promptfoo

A runnable OpenTelemetry integration example for promptfoo that demonstrates built-in tracing and observability for prompt evaluation workflows.

Copy chain

Works with opentelemetry

Promptfoo

Maintainer?

Spark score

out of 100

Updated yesterday

Version code-scan-action-0.1

Models

gpt 4o

Add to Favorites

Why it matters

Integrate AI observability into your CI/CD pipeline using OpenTelemetry. This asset helps you monitor and debug AI model performance within your existing DevOps workflows.

Outcomes

What it gets done

Set up OpenTelemetry for AI model tracing.

Monitor AI performance metrics in real-time.

Debug AI-related issues within the CI/CD process.

Analyze AI model behavior and identify bottlenecks.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-built-in | bash

Capabilities

What this chain does

Deploy / CI

Runs build pipelines, tests, and deploys to environments.

Debug

Traces errors to their root cause and suggests fixes.

Review code

Analyzes code for bugs, style issues, and improvements.

Overview

Built In

What it does

This is a runnable example that demonstrates promptfoo's built-in OpenTelemetry integration. It shows how to enable tracing and observability for prompt evaluation workflows using the native OpenTelemetry support included in promptfoo, allowing you to collect telemetry data from your prompt chains and evaluation runs.

How it connects

Use this example when you want to add observability to your promptfoo evaluations or need a reference implementation for integrating OpenTelemetry tracing into your prompt testing workflows. It's ideal for teams setting up monitoring and performance analysis for their LLM evaluation pipelines.

Source README

integration-opentelemetry/built-in (OpenTelemetry Built-in Tracing)

You can run this example with:

npx promptfoo@latest init --example integration-opentelemetry/built-in
cd integration-opentelemetry/built-in

This example demonstrates promptfoo's built-in OpenTelemetry tracing for LLM provider calls.

Quick Start

Set up environment variables:

# Required for the providers you want to test
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Add other provider keys as needed

Run the evaluation:

npx promptfoo eval -c promptfooconfig.yaml

View traces in the UI:

npx promptfoo view

Navigate to the Traces tab to see detailed span information.

Configuration

Tracing is enabled by default. Configure via environment variables:

Variable	Default	Description
`PROMPTFOO_DISABLE_TRACING`	`false`	Set to `true` to disable tracing
`OTEL_EXPORTER_OTLP_ENDPOINT`	-	Export traces to external OTLP backend
`OTEL_SERVICE_NAME`	`promptfoo`	Service name in traces

Viewing Traces Externally

With Jaeger

Start Jaeger:

docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

Run eval with OTLP export:

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 npx promptfoo eval

View at http://localhost:16686

With Honeycomb

OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io \
OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_API_KEY" \
npx promptfoo eval

Trace Attributes

Each LLM call span includes:

GenAI Semantic Conventions

gen_ai.system - Provider system (openai, anthropic, etc.)
gen_ai.operation.name - Operation type (chat, completion, embedding)
gen_ai.request.model - Requested model name
gen_ai.request.max_tokens - Max tokens setting
gen_ai.request.temperature - Temperature setting
gen_ai.usage.input_tokens - Prompt tokens used
gen_ai.usage.output_tokens - Completion tokens used
gen_ai.usage.total_tokens - Total tokens
gen_ai.usage.cached_tokens - Cached tokens (Anthropic)
gen_ai.usage.reasoning_tokens - Reasoning tokens (o1 models)
gen_ai.response.model - Actual model used
gen_ai.response.id - Provider response ID
gen_ai.response.finish_reasons - Finish reasons

Promptfoo Attributes

promptfoo.provider.id - Provider identifier
promptfoo.eval.id - Evaluation run ID
promptfoo.test.index - Test case index
promptfoo.prompt.label - Prompt label

Supported Providers

All major providers are instrumented:

Provider	Tracing Support
OpenAI	✓
Anthropic	✓
Azure OpenAI	✓
AWS Bedrock	✓
Google Vertex AI	✓
Ollama	✓
Mistral	✓
Cohere	✓
Huggingface	✓
IBM Watsonx	✓
HTTP	✓
OpenRouter	✓
Replicate	✓
OpenAI-compatible	✓ (inherited)
Cloudflare AI	✓ (inherited)

Discussion