Distributed Tracing with Jaeger for ChatGPT Apps

Building ChatGPT apps with MCP servers, widgets, and external APIs creates complex distributed systems where a single user interaction can trigger dozens of service calls. When latency issues or errors occur, traditional logging falls short—you need distributed tracing to understand the full request flow across all components.

Jaeger is an open-source, end-to-end distributed tracing system originally developed by Uber. It helps you monitor and troubleshoot transactions in complex microservices environments by tracking requests as they flow through your ChatGPT app infrastructure. With Jaeger, you can visualize the entire request lifecycle—from when ChatGPT invokes your MCP server tool, through database queries, external API calls, and widget rendering—all in a single timeline view.

This comprehensive guide demonstrates how to implement production-ready Jaeger tracing for ChatGPT applications. You'll learn how to instrument MCP servers with OpenTelemetry, trace tool invocations and widget renders, configure sampling strategies for high-traffic applications, and deploy Jaeger at scale using Kubernetes and Elasticsearch. By the end, you'll have complete visibility into your ChatGPT app's performance characteristics and the ability to diagnose issues that span multiple services.

Whether you're debugging slow tool invocations, tracking down intermittent errors, or optimizing your MCP server's performance profile, Jaeger distributed tracing provides the observability foundation you need. For a complete overview of ChatGPT app architecture, see our Complete Guide to Building ChatGPT Applications.

Understanding Jaeger Architecture for ChatGPT Apps

Jaeger's architecture consists of several key components working together to collect, store, and visualize traces:

Jaeger Client Libraries instrument your MCP server code using OpenTelemetry SDKs. These libraries create spans (individual operations) and traces (collections of spans representing a complete request flow). When ChatGPT invokes a tool, the client library automatically creates a root span and propagates trace context to downstream services.

Jaeger Agent is a network daemon that listens for spans sent over UDP, batches them, and forwards them to the collector. Running agents as sidecars in Kubernetes provides efficient local trace collection without adding network latency to your MCP server.

Jaeger Collector receives traces from agents, validates them, runs processing pipelines (sampling, enrichment), and writes them to storage. Collectors can scale horizontally to handle high ingestion rates from production ChatGPT apps serving millions of requests.

Storage Backend persists trace data for querying. Development environments typically use in-memory storage, while production deployments use Elasticsearch, Cassandra, or other scalable databases to store billions of spans with configurable retention policies.

Jaeger Query Service provides APIs and a React-based UI for searching, filtering, and visualizing traces. You can search by trace ID, service name, operation name, tags, or time range to find specific requests through your ChatGPT app.

For ChatGPT applications, the typical trace flow looks like this:

  1. ChatGPT sends request → Your MCP server receives it (root span created)
  2. MCP server calls tool handler → Database query executed (child span)
  3. Tool handler calls external API → HTTP request tracked (child span)
  4. Widget template rendered → Template processing measured (child span)
  5. Response returned to ChatGPT → Trace completed and sent to Jaeger

This end-to-end visibility is critical for understanding where time is spent in complex ChatGPT app workflows. Learn more about performance optimization in our MCP Server Performance Optimization guide.

Setting Up Jaeger Infrastructure

The fastest way to get started with Jaeger is using the all-in-one Docker image, which bundles all components into a single container. This is perfect for development and testing:

# docker-compose.yml - Jaeger All-in-One Development Setup
version: '3.8'

services:
  jaeger:
    image: jaegertracing/all-in-one:1.52
    container_name: jaeger-aio
    restart: unless-stopped

    environment:
      # Collector configuration
      COLLECTOR_ZIPKIN_HOST_PORT: ':9411'
      COLLECTOR_OTLP_ENABLED: 'true'

      # Storage configuration (in-memory)
      SPAN_STORAGE_TYPE: 'memory'

      # Sampling configuration
      SAMPLING_STRATEGIES_FILE: '/etc/jaeger/sampling.json'

      # Query configuration
      QUERY_BASE_PATH: '/jaeger'

      # Metrics configuration
      METRICS_BACKEND: 'prometheus'
      METRICS_HTTP_ROUTE: '/metrics'

    ports:
      # Jaeger UI
      - '16686:16686'

      # Collector endpoints
      - '14268:14268'     # HTTP collector
      - '14250:14250'     # gRPC collector
      - '4317:4317'       # OTLP gRPC
      - '4318:4318'       # OTLP HTTP

      # Agent endpoints
      - '6831:6831/udp'   # Thrift compact
      - '6832:6832/udp'   # Thrift binary
      - '5778:5778'       # Serve configs

      # Zipkin compatibility
      - '9411:9411'       # Zipkin HTTP

      # Health check
      - '14269:14269'     # Admin port

    volumes:
      - ./jaeger-config:/etc/jaeger:ro
      - jaeger-data:/tmp

    healthcheck:
      test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:14269/']
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

    networks:
      - chatgpt-app-network

  # Your MCP server
  mcp-server:
    build: ./mcp-server
    container_name: chatgpt-mcp-server
    restart: unless-stopped

    environment:
      # Jaeger configuration
      JAEGER_AGENT_HOST: 'jaeger'
      JAEGER_AGENT_PORT: '6831'
      JAEGER_SAMPLER_TYPE: 'probabilistic'
      JAEGER_SAMPLER_PARAM: '0.1'
      JAEGER_SERVICE_NAME: 'chatgpt-mcp-server'

      # OTLP configuration (alternative to Jaeger agent)
      OTEL_EXPORTER_OTLP_ENDPOINT: 'http://jaeger:4318'
      OTEL_EXPORTER_OTLP_PROTOCOL: 'http/protobuf'
      OTEL_SERVICE_NAME: 'chatgpt-mcp-server'

    depends_on:
      jaeger:
        condition: service_healthy

    networks:
      - chatgpt-app-network

    ports:
      - '3000:3000'

volumes:
  jaeger-data:
    driver: local

networks:
  chatgpt-app-network:
    driver: bridge

Create a sampling configuration file to control trace collection rates:

{
  "service_strategies": [
    {
      "service": "chatgpt-mcp-server",
      "type": "probabilistic",
      "param": 0.1,
      "operation_strategies": [
        {
          "operation": "tool:search_knowledge_base",
          "type": "probabilistic",
          "param": 1.0
        },
        {
          "operation": "tool:generate_report",
          "type": "probabilistic",
          "param": 0.5
        },
        {
          "operation": "widget:render_chart",
          "type": "probabilistic",
          "param": 0.2
        }
      ]
    }
  ],
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.001
  }
}

Sampling strategies explained:

  • Probabilistic: Sample a percentage of traces (0.1 = 10%, 1.0 = 100%)
  • Rate Limiting: Sample up to N traces per second
  • Remote: Fetch sampling decisions from Jaeger backend dynamically
  • Const: Always sample (1) or never sample (0)

For ChatGPT apps, use operation-level sampling to trace critical tools at 100% while sampling less important operations at lower rates. This balances observability with storage costs.

Start Jaeger and verify it's running:

# Start all services
docker-compose up -d

# Check Jaeger health
curl http://localhost:14269/

# Open Jaeger UI
open http://localhost:16686

The Jaeger UI provides search, trace visualization, dependency graphs, and service performance analytics. For production deployments, see the Production Deployment section below.

Instrumenting Your MCP Server with OpenTelemetry

Jaeger uses OpenTelemetry as its instrumentation standard. OpenTelemetry provides vendor-neutral APIs and SDKs for generating, collecting, and exporting telemetry data. Here's how to instrument a TypeScript MCP server:

// src/instrumentation.ts - OpenTelemetry Tracer Configuration
import { NodeSDK } from '@opentelemetry/sdk-node';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';
import { PgInstrumentation } from '@opentelemetry/instrumentation-pg';
import { RedisInstrumentation } from '@opentelemetry/instrumentation-redis-4';
import { trace, context, SpanStatusCode, SpanKind } from '@opentelemetry/api';

/**
 * Initialize OpenTelemetry instrumentation for Jaeger tracing
 *
 * This configures auto-instrumentation for HTTP, Express, PostgreSQL, and Redis
 * and sets up trace export to Jaeger via OTLP or Jaeger agent.
 */
export function initializeTracing() {
  const serviceName = process.env.OTEL_SERVICE_NAME || 'chatgpt-mcp-server';
  const serviceVersion = process.env.SERVICE_VERSION || '1.0.0';
  const deploymentEnvironment = process.env.DEPLOYMENT_ENV || 'development';

  // Create resource with service metadata
  const resource = new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: serviceName,
    [SemanticResourceAttributes.SERVICE_VERSION]: serviceVersion,
    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: deploymentEnvironment,
    [SemanticResourceAttributes.SERVICE_NAMESPACE]: 'chatgpt-apps',
    [SemanticResourceAttributes.SERVICE_INSTANCE_ID]: process.env.HOSTNAME || 'local',
  });

  // Configure trace exporter (OTLP or Jaeger)
  let traceExporter;

  if (process.env.OTEL_EXPORTER_OTLP_ENDPOINT) {
    // Use OTLP exporter (recommended for Jaeger 1.35+)
    traceExporter = new OTLPTraceExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
      headers: {
        'Authorization': process.env.OTEL_EXPORTER_OTLP_HEADERS || '',
      },
    });
  } else {
    // Use legacy Jaeger exporter
    traceExporter = new JaegerExporter({
      agentHost: process.env.JAEGER_AGENT_HOST || 'localhost',
      agentPort: parseInt(process.env.JAEGER_AGENT_PORT || '6831', 10),
    });
  }

  // Initialize SDK with auto-instrumentation
  const sdk = new NodeSDK({
    resource,
    spanProcessor: new BatchSpanProcessor(traceExporter, {
      maxQueueSize: 2048,
      maxExportBatchSize: 512,
      scheduledDelayMillis: 5000,
      exportTimeoutMillis: 30000,
    }),
    instrumentations: [
      // HTTP/HTTPS instrumentation
      new HttpInstrumentation({
        ignoreIncomingRequestHook: (req) => {
          // Don't trace health checks
          return req.url === '/health' || req.url === '/metrics';
        },
        requestHook: (span, request) => {
          span.setAttribute('http.user_agent', request.headers['user-agent'] || 'unknown');
        },
      }),

      // Express instrumentation
      new ExpressInstrumentation({
        requestHook: (span, info) => {
          span.setAttribute('express.type', info.layerType);
        },
      }),

      // PostgreSQL instrumentation
      new PgInstrumentation({
        enhancedDatabaseReporting: true,
      }),

      // Redis instrumentation
      new RedisInstrumentation(),
    ],
  });

  // Start SDK
  sdk.start();

  // Graceful shutdown
  process.on('SIGTERM', () => {
    sdk.shutdown()
      .then(() => console.log('Tracing terminated'))
      .catch((error) => console.error('Error shutting down tracing', error))
      .finally(() => process.exit(0));
  });

  return sdk;
}

// Initialize at application startup
initializeTracing();

// Export tracer for manual instrumentation
export const tracer = trace.getTracer(
  process.env.OTEL_SERVICE_NAME || 'chatgpt-mcp-server',
  process.env.SERVICE_VERSION || '1.0.0'
);

Install the required dependencies:

npm install --save \
  @opentelemetry/sdk-node \
  @opentelemetry/api \
  @opentelemetry/resources \
  @opentelemetry/semantic-conventions \
  @opentelemetry/exporter-jaeger \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/sdk-trace-base \
  @opentelemetry/instrumentation-http \
  @opentelemetry/instrumentation-express \
  @opentelemetry/instrumentation-pg \
  @opentelemetry/instrumentation-redis-4

Import the instrumentation module before any other imports in your application entry point:

// src/index.ts
import './instrumentation'; // MUST be first import
import express from 'express';
import { MCPServer } from './mcp-server';

const app = express();
const mcpServer = new MCPServer();

// Your application code...

The auto-instrumentation libraries will automatically create spans for HTTP requests, database queries, and cache operations. For custom operations like MCP tool invocations, you'll need manual instrumentation (see next section).

For more details on OpenTelemetry integration patterns, see our OpenTelemetry Integration guide.

Tracing MCP Server Tool Invocations

Auto-instrumentation covers infrastructure-level operations, but you need manual spans to trace MCP-specific operations like tool invocations and widget rendering. Here's a production-ready middleware for tracing MCP tools:

// src/middleware/tracing.ts - MCP Tool Tracing Middleware
import { tracer } from '../instrumentation';
import { context, trace, SpanStatusCode, SpanKind } from '@opentelemetry/api';
import type { MCPRequest, MCPResponse, MCPTool } from '../types/mcp';

/**
 * Create a traced wrapper around an MCP tool handler
 *
 * This middleware automatically creates spans for tool invocations,
 * captures input/output metadata, and propagates trace context.
 */
export function traceMCPTool<TInput, TOutput>(
  tool: MCPTool<TInput, TOutput>
) {
  return async (request: MCPRequest<TInput>): Promise<MCPResponse<TOutput>> => {
    // Extract trace context from ChatGPT request headers
    const activeContext = context.active();

    return tracer.startActiveSpan(
      `tool:${tool.name}`,
      {
        kind: SpanKind.SERVER,
        attributes: {
          // MCP-specific attributes
          'mcp.tool.name': tool.name,
          'mcp.tool.description': tool.description,
          'mcp.tool.version': tool.version || '1.0.0',

          // Request metadata
          'mcp.request.id': request.id,
          'mcp.request.timestamp': new Date().toISOString(),

          // Input metadata (sanitized - no PII/secrets)
          'mcp.tool.input.keys': Object.keys(request.params).join(','),
          'mcp.tool.input.size': JSON.stringify(request.params).length,

          // User context (if available)
          'user.id': request.user?.id || 'anonymous',
          'user.locale': request.user?.locale || 'unknown',
        },
      },
      activeContext,
      async (span) => {
        try {
          // Add custom span events
          span.addEvent('tool:invocation:started', {
            'tool.name': tool.name,
            'input.validation': 'pending',
          });

          // Execute tool handler
          const startTime = Date.now();
          const result = await tool.handler(request.params, request.user);
          const duration = Date.now() - startTime;

          // Record successful execution
          span.addEvent('tool:invocation:completed', {
            'execution.duration_ms': duration,
            'output.size': JSON.stringify(result).length,
          });

          // Add output metadata
          span.setAttributes({
            'mcp.tool.execution.duration_ms': duration,
            'mcp.tool.execution.status': 'success',
            'mcp.tool.output.type': typeof result,
            'mcp.tool.output.size': JSON.stringify(result).length,
          });

          // Mark span as successful
          span.setStatus({ code: SpanStatusCode.OK });

          // Return response with trace context
          return {
            id: request.id,
            result,
            metadata: {
              trace_id: span.spanContext().traceId,
              span_id: span.spanContext().spanId,
              execution_time_ms: duration,
            },
          };
        } catch (error) {
          // Record error in span
          span.recordException(error as Error);
          span.setStatus({
            code: SpanStatusCode.ERROR,
            message: (error as Error).message,
          });

          span.addEvent('tool:invocation:failed', {
            'error.type': (error as Error).name,
            'error.message': (error as Error).message,
            'error.stack': (error as Error).stack || '',
          });

          // Re-throw error after recording
          throw error;
        } finally {
          // End span (automatically recorded to Jaeger)
          span.end();
        }
      }
    );
  };
}

/**
 * Trace widget rendering operations
 */
export function traceWidgetRender(
  widgetName: string,
  renderFn: () => Promise<string>
) {
  return tracer.startActiveSpan(
    `widget:render:${widgetName}`,
    {
      kind: SpanKind.INTERNAL,
      attributes: {
        'widget.name': widgetName,
        'widget.type': 'html+skybridge',
      },
    },
    async (span) => {
      try {
        const startTime = Date.now();
        const html = await renderFn();
        const duration = Date.now() - startTime;

        span.setAttributes({
          'widget.render.duration_ms': duration,
          'widget.output.size': html.length,
          'widget.output.lines': html.split('\n').length,
        });

        span.setStatus({ code: SpanStatusCode.OK });
        return html;
      } catch (error) {
        span.recordException(error as Error);
        span.setStatus({ code: SpanStatusCode.ERROR });
        throw error;
      } finally {
        span.end();
      }
    }
  );
}

/**
 * Trace external API calls with propagation
 */
export async function traceExternalAPI<T>(
  serviceName: string,
  operation: string,
  apiFn: () => Promise<T>
): Promise<T> {
  return tracer.startActiveSpan(
    `external:${serviceName}:${operation}`,
    {
      kind: SpanKind.CLIENT,
      attributes: {
        'peer.service': serviceName,
        'external.operation': operation,
      },
    },
    async (span) => {
      try {
        // Propagate trace context via headers
        const traceHeaders = {};
        trace.getSpanContext(context.active());

        const result = await apiFn();
        span.setStatus({ code: SpanStatusCode.OK });
        return result;
      } catch (error) {
        span.recordException(error as Error);
        span.setStatus({ code: SpanStatusCode.ERROR });
        throw error;
      } finally {
        span.end();
      }
    }
  );
}

Use the tracing middleware in your MCP tools:

// src/tools/search.ts - Example Traced MCP Tool
import { traceMCPTool, traceExternalAPI, traceWidgetRender } from '../middleware/tracing';
import type { MCPTool } from '../types/mcp';

interface SearchInput {
  query: string;
  filters?: Record<string, any>;
  limit?: number;
}

interface SearchOutput {
  results: Array<{
    id: string;
    title: string;
    score: number;
  }>;
  total: number;
}

const searchTool: MCPTool<SearchInput, SearchOutput> = {
  name: 'search_knowledge_base',
  description: 'Search knowledge base with semantic search',
  version: '1.0.0',

  handler: async (input, user) => {
    // External API call is automatically traced
    const results = await traceExternalAPI(
      'elasticsearch',
      'semantic_search',
      async () => {
        const response = await fetch('http://elasticsearch:9200/_search', {
          method: 'POST',
          body: JSON.stringify({
            query: { match: { content: input.query } },
            size: input.limit || 10,
          }),
        });
        return response.json();
      }
    );

    // Widget rendering is traced
    const widgetHTML = await traceWidgetRender(
      'search_results',
      async () => {
        return `<div class="search-results">...</div>`;
      }
    );

    return {
      results: results.hits.hits.map(hit => ({
        id: hit._id,
        title: hit._source.title,
        score: hit._score,
      })),
      total: results.hits.total.value,
    };
  },
};

// Wrap with tracing middleware
export const tracedSearchTool = traceMCPTool(searchTool);

This creates a hierarchical trace:

tool:search_knowledge_base (150ms)
  ├─ external:elasticsearch:semantic_search (120ms)
  │   └─ http:POST (115ms)
  └─ widget:render:search_results (25ms)

Each span captures timing, attributes, and errors for complete request observability.

Analyzing Traces in the Jaeger UI

Once you have traces flowing into Jaeger, the UI provides powerful analysis capabilities:

Finding Traces

Search for traces using multiple criteria:

# Search by service
Service: chatgpt-mcp-server
Operation: tool:search_knowledge_base
Lookback: Last 1 hour

# Search by tags
Tags: error=true
Tags: user.id=user_12345
Tags: mcp.tool.execution.duration_ms>1000

# Search by trace ID
Trace ID: 3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f

Trace Timeline Visualization

Each trace shows a waterfall timeline of all spans:

tool:search_knowledge_base                    [=================] 150ms
  external:elasticsearch:semantic_search      [============    ] 120ms
    http:POST /elasticsearch:9200/_search     [===========     ] 115ms
      pg:query SELECT * FROM cache            [==             ] 30ms
  widget:render:search_results                [====           ] 25ms

This immediately reveals:

  • Bottlenecks: The Elasticsearch query takes 80% of total time
  • Parallelization opportunities: Widget rendering could happen concurrently
  • Unexpected delays: 30ms cache query within HTTP request

Dependency Graph

The System Architecture view shows service dependencies:

ChatGPT
  ↓
chatgpt-mcp-server (150ms avg)
  ↓
elasticsearch (120ms avg)
  ↓
postgresql (30ms avg)

This helps you understand:

  • Service call patterns
  • Latency contribution by service
  • Critical path dependencies

Error Analysis

Filter traces with errors and examine failure patterns:

// Traces with errors show red highlights
tool:search_knowledge_base [ERROR]
  external:elasticsearch:semantic_search [ERROR]
    http:POST [500 Internal Server Error]
      Error: Connection timeout after 30s

Jaeger captures the full error context including stack traces and request parameters.

Performance Metrics

The Compare view shows latency distributions:

P50: 120ms
P95: 350ms
P99: 890ms
Error rate: 0.3%

Use this to set SLOs and track performance regressions. For example, if P95 latency for tool:search_knowledge_base increases from 350ms to 600ms after a deployment, you can compare traces before/after to identify the cause.

Query traces programmatically via the Jaeger API:

#!/bin/bash
# query-traces.sh - Query Jaeger API for Trace Analysis

JAEGER_URL="http://localhost:16686"
SERVICE="chatgpt-mcp-server"
OPERATION="tool:search_knowledge_base"
START_TIME=$(date -u -d '1 hour ago' +%s%6N)
END_TIME=$(date -u +%s%6N)

# Find traces by service and operation
TRACES=$(curl -s "${JAEGER_URL}/api/traces?service=${SERVICE}&operation=${OPERATION}&start=${START_TIME}&end=${END_TIME}&limit=100")

# Extract trace IDs
TRACE_IDS=$(echo "$TRACES" | jq -r '.data[].traceID')

# Get full trace details
for TRACE_ID in $TRACE_IDS; do
  echo "Analyzing trace: $TRACE_ID"
  curl -s "${JAEGER_URL}/api/traces/${TRACE_ID}" | jq '.data[0].spans[] | {
    operationName: .operationName,
    duration: .duration,
    tags: .tags
  }'
done

# Calculate average duration
echo "$TRACES" | jq '[.data[].spans[] | select(.operationName == "tool:search_knowledge_base") | .duration] | add / length'

For more advanced analysis, integrate Jaeger with Prometheus and Grafana to create custom dashboards. See our Kubernetes Deployment guide for production monitoring setups.

Production Deployment with Kubernetes and Elasticsearch

The all-in-one Jaeger image is great for development, but production deployments require distributed architecture with dedicated collector, query, and storage components:

# jaeger-production.yaml - Production Jaeger on Kubernetes
apiVersion: v1
kind: Namespace
metadata:
  name: observability

---
# Elasticsearch for trace storage
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: observability
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
        env:
        - name: cluster.name
          value: jaeger-cluster
        - name: discovery.seed_hosts
          value: elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch
        - name: cluster.initial_master_nodes
          value: elasticsearch-0,elasticsearch-1,elasticsearch-2
        - name: ES_JAVA_OPTS
          value: "-Xms2g -Xmx2g"
        ports:
        - containerPort: 9200
          name: http
        - containerPort: 9300
          name: transport
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        resources:
          requests:
            memory: 4Gi
            cpu: 1000m
          limits:
            memory: 4Gi
            cpu: 2000m
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 100Gi

---
# Jaeger Collector (ingestion)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger-collector
  namespace: observability
spec:
  replicas: 3
  selector:
    matchLabels:
      app: jaeger-collector
  template:
    metadata:
      labels:
        app: jaeger-collector
    spec:
      containers:
      - name: jaeger-collector
        image: jaegertracing/jaeger-collector:1.52
        env:
        - name: SPAN_STORAGE_TYPE
          value: elasticsearch
        - name: ES_SERVER_URLS
          value: http://elasticsearch:9200
        - name: ES_NUM_SHARDS
          value: "3"
        - name: ES_NUM_REPLICAS
          value: "1"
        - name: COLLECTOR_QUEUE_SIZE
          value: "10000"
        - name: COLLECTOR_NUM_WORKERS
          value: "50"
        ports:
        - containerPort: 14250
          name: grpc
        - containerPort: 14268
          name: http
        - containerPort: 4317
          name: otlp-grpc
        - containerPort: 4318
          name: otlp-http
        resources:
          requests:
            memory: 2Gi
            cpu: 1000m
          limits:
            memory: 4Gi
            cpu: 2000m
        livenessProbe:
          httpGet:
            path: /
            port: 14269
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 14269
          initialDelaySeconds: 10
          periodSeconds: 5

---
# Jaeger Query (UI and API)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger-query
  namespace: observability
spec:
  replicas: 2
  selector:
    matchLabels:
      app: jaeger-query
  template:
    metadata:
      labels:
        app: jaeger-query
    spec:
      containers:
      - name: jaeger-query
        image: jaegertracing/jaeger-query:1.52
        env:
        - name: SPAN_STORAGE_TYPE
          value: elasticsearch
        - name: ES_SERVER_URLS
          value: http://elasticsearch:9200
        - name: QUERY_BASE_PATH
          value: /jaeger
        ports:
        - containerPort: 16686
          name: ui
        - containerPort: 16687
          name: grpc
        resources:
          requests:
            memory: 512Mi
            cpu: 500m
          limits:
            memory: 1Gi
            cpu: 1000m

---
# Jaeger Agent (sidecar - deployed via DaemonSet)
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: jaeger-agent
  namespace: observability
spec:
  selector:
    matchLabels:
      app: jaeger-agent
  template:
    metadata:
      labels:
        app: jaeger-agent
    spec:
      hostNetwork: true
      containers:
      - name: jaeger-agent
        image: jaegertracing/jaeger-agent:1.52
        env:
        - name: REPORTER_GRPC_HOST_PORT
          value: jaeger-collector:14250
        ports:
        - containerPort: 6831
          protocol: UDP
          name: thrift-compact
        - containerPort: 6832
          protocol: UDP
          name: thrift-binary
        - containerPort: 5778
          name: config
        resources:
          requests:
            memory: 128Mi
            cpu: 100m
          limits:
            memory: 256Mi
            cpu: 200m

---
# Services
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace: observability
spec:
  clusterIP: None
  selector:
    app: elasticsearch
  ports:
  - port: 9200
    name: http
  - port: 9300
    name: transport

---
apiVersion: v1
kind: Service
metadata:
  name: jaeger-collector
  namespace: observability
spec:
  selector:
    app: jaeger-collector
  ports:
  - port: 14250
    name: grpc
  - port: 14268
    name: http
  - port: 4317
    name: otlp-grpc
  - port: 4318
    name: otlp-http

---
apiVersion: v1
kind: Service
metadata:
  name: jaeger-query
  namespace: observability
spec:
  type: LoadBalancer
  selector:
    app: jaeger-query
  ports:
  - port: 80
    targetPort: 16686
    name: ui
  - port: 16687
    name: grpc

Deploy to Kubernetes:

# Apply Jaeger production deployment
kubectl apply -f jaeger-production.yaml

# Wait for Elasticsearch to be ready
kubectl wait --for=condition=ready pod -l app=elasticsearch -n observability --timeout=300s

# Check Jaeger collector status
kubectl get pods -n observability -l app=jaeger-collector

# Port-forward Jaeger UI (or use LoadBalancer IP)
kubectl port-forward -n observability svc/jaeger-query 16686:80

# Open UI
open http://localhost:16686

Production configuration best practices:

  1. Elasticsearch tuning: Use 3+ node cluster with replication for high availability
  2. Collector scaling: Autoscale based on ingestion rate (CPU/memory metrics)
  3. Agent sidecars: Deploy agents as DaemonSet for low-latency local collection
  4. Retention policies: Configure index lifecycle management to delete old traces (7-30 days)
  5. Sampling strategies: Use adaptive sampling to balance cost and coverage
  6. Security: Enable authentication, TLS, and network policies

For complete Kubernetes deployment guides including monitoring and alerting, see our Kubernetes Deployment for ChatGPT Apps article.

Conclusion: Complete Observability for ChatGPT Applications

Distributed tracing with Jaeger transforms how you understand and optimize ChatGPT applications. By instrumenting your MCP servers with OpenTelemetry, you gain end-to-end visibility into every tool invocation, widget render, database query, and external API call. This observability foundation enables you to:

  • Debug faster: Find the exact service and operation causing errors or latency
  • Optimize smarter: Identify bottlenecks and parallelization opportunities with data
  • Scale confidently: Understand dependency chains and failure modes before production issues arise
  • Meet SLOs: Track P95/P99 latency and error rates across all ChatGPT app operations

The production-ready examples in this guide give you everything you need to implement Jaeger tracing: Docker Compose for development, TypeScript instrumentation for MCP servers, middleware for tool tracing, and Kubernetes deployment for production scale. Start with the all-in-one setup, instrument your critical tools, and analyze traces in the Jaeger UI to build high-performance ChatGPT applications your users will love.

For more advanced observability patterns, explore our related guides on OpenTelemetry Integration and MCP Server Performance Optimization.

Ready to Build High-Performance ChatGPT Apps?

Implementing distributed tracing is just one piece of building production-ready ChatGPT applications. MakeAIHQ provides a complete no-code platform for creating, deploying, and monitoring ChatGPT apps—with built-in observability, performance optimization, and one-click deployment to the ChatGPT App Store.

No infrastructure setup. No complex instrumentation. Just describe your app and deploy in 48 hours.

Start Building Your ChatGPT App →


Related Guides: