MCP Server Monitoring: Prometheus, Grafana & Distributed Tracing

Production ChatGPT apps built with MCP servers require enterprise-grade observability to maintain reliability at scale. When your app reaches OpenAI's 800 million users, basic logging isn't enough—you need real-time metrics, visual dashboards, and distributed tracing to understand system behavior, diagnose performance bottlenecks, and maintain SLA commitments.

This comprehensive guide walks you through implementing world-class observability infrastructure using Prometheus (metrics collection), Grafana (visualization), and OpenTelemetry (distributed tracing). You'll learn how to instrument MCP servers for production monitoring, design effective dashboards, implement intelligent alerting, and trace requests across distributed systems. These are the same patterns used by companies like Uber, Netflix, and Airbnb to maintain 99.99% uptime.

Why Observability Matters for MCP Servers

Traditional monitoring focuses on infrastructure metrics (CPU, memory, disk), but MCP servers demand application-level observability. Your monitoring stack must answer critical questions:

Performance: What's the P95 latency for each tool? Which operations are slowing down user conversations?
Reliability: What's the error rate for authentication? Are users experiencing failed tool invocations?
Scale: Can your server handle 1,000 concurrent users? What happens at 10,000?
User Experience: Which tools are users invoking most? Where are conversations dropping off?

Without observability, production incidents become chaotic firefighting exercises. With proper monitoring, you detect issues before users notice, identify root causes in minutes instead of hours, and build confidence in your system's behavior.

Learn foundational MCP architecture in our complete MCP server development guide.

Prometheus Metrics: Deep Instrumentation

Prometheus is the industry standard for metrics collection in cloud-native applications. Its pull-based architecture, powerful query language (PromQL), and native Kubernetes integration make it ideal for MCP server monitoring. Here's a production-grade Prometheus implementation with advanced metric types:

// prometheus-exporter.ts - Advanced MCP Metrics Collection
import express from 'express';
import client from 'prom-client';

// Create dedicated metric registry
const register = new client.Registry();

// Collect default Node.js metrics (event loop lag, heap size, GC stats)
client.collectDefaultMetrics({
  register,
  prefix: 'mcp_nodejs_',
  gcDurationBuckets: [0.001, 0.01, 0.1, 1, 2, 5]
});

// Tool invocation counter (tracks usage patterns)
const toolInvocations = new client.Counter({
  name: 'mcp_tool_invocations_total',
  help: 'Total number of MCP tool invocations',
  labelNames: ['tool_name', 'status', 'user_tier'],
  registers: [register]
});

// Tool execution latency histogram (P50, P95, P99 percentiles)
const toolLatency = new client.Histogram({
  name: 'mcp_tool_execution_seconds',
  help: 'MCP tool execution time in seconds',
  labelNames: ['tool_name', 'cache_hit'],
  buckets: [0.05, 0.1, 0.25, 0.5, 1, 2, 5, 10, 30], // Response time buckets
  registers: [register]
});

// Active connections gauge (real-time connection health)
const activeConnections = new client.Gauge({
  name: 'mcp_active_connections',
  help: 'Number of active MCP connections',
  labelNames: ['transport_type'],
  registers: [register]
});

// Widget render time histogram (UI performance tracking)
const widgetRenderTime = new client.Histogram({
  name: 'mcp_widget_render_seconds',
  help: 'Widget rendering latency in seconds',
  labelNames: ['widget_type', 'complexity'],
  buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2],
  registers: [register]
});

// Token usage summary (track prompt/completion tokens for cost optimization)
const tokenUsage = new client.Summary({
  name: 'mcp_token_usage',
  help: 'Token usage per tool invocation',
  labelNames: ['tool_name', 'token_type'],
  percentiles: [0.5, 0.9, 0.95, 0.99],
  registers: [register]
});

// Authentication failures counter (security monitoring)
const authFailures = new client.Counter({
  name: 'mcp_auth_failures_total',
  help: 'Total authentication failures',
  labelNames: ['failure_reason', 'source_ip'],
  registers: [register]
});

// Database query duration (external dependency monitoring)
const dbQueryDuration = new client.Histogram({
  name: 'mcp_db_query_seconds',
  help: 'Database query execution time',
  labelNames: ['operation', 'table'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5],
  registers: [register]
});

// Cache hit rate (performance optimization tracking)
const cacheOperations = new client.Counter({
  name: 'mcp_cache_operations_total',
  help: 'Total cache operations',
  labelNames: ['operation', 'result'], // operation: get/set/delete, result: hit/miss
  registers: [register]
});

// Export metrics endpoint for Prometheus scraping
const app = express();

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

// Health check endpoint (Prometheus scrape health)
app.get('/health', (req, res) => {
  res.json({ status: 'healthy', timestamp: Date.now() });
});

const PORT = process.env.METRICS_PORT || 9090;
app.listen(PORT, () => {
  console.log(`✅ Prometheus metrics available at http://localhost:${PORT}/metrics`);
});

// Instrumentation helpers
export const metrics = {
  recordToolInvocation: (toolName: string, status: string, userTier: string) => {
    toolInvocations.inc({ tool_name: toolName, status, user_tier: userTier });
  },

  recordToolLatency: (toolName: string, durationSeconds: number, cacheHit: boolean) => {
    toolLatency.observe({ tool_name: toolName, cache_hit: String(cacheHit) }, durationSeconds);
  },

  setActiveConnections: (count: number, transport: string) => {
    activeConnections.set({ transport_type: transport }, count);
  },

  recordWidgetRender: (widgetType: string, durationSeconds: number, complexity: string) => {
    widgetRenderTime.observe({ widget_type: widgetType, complexity }, durationSeconds);
  },

  recordTokenUsage: (toolName: string, tokenType: string, count: number) => {
    tokenUsage.observe({ tool_name: toolName, token_type: tokenType }, count);
  },

  recordAuthFailure: (reason: string, sourceIP: string) => {
    authFailures.inc({ failure_reason: reason, source_ip: sourceIP });
  },

  recordDBQuery: (operation: string, table: string, durationSeconds: number) => {
    dbQueryDuration.observe({ operation, table }, durationSeconds);
  },

  recordCacheOperation: (operation: string, result: string) => {
    cacheOperations.inc({ operation, result });
  }
};

Configure Prometheus to scrape this endpoint by adding to prometheus.yml:

# prometheus.yml - Scrape Configuration
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'production'
    environment: 'prod'

scrape_configs:
  # MCP server metrics
  - job_name: 'mcp-server'
    scrape_interval: 10s
    scrape_timeout: 5s
    static_configs:
      - targets: ['mcp-server-1:9090', 'mcp-server-2:9090', 'mcp-server-3:9090']
        labels:
          instance: 'mcp-production'

    # Relabeling for dynamic service discovery
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
      - source_labels: [__address__]
        regex: '([^:]+):(.*)'
        target_label: pod
        replacement: '${1}'

Key Metric Patterns:

Counters: Monotonically increasing values (tool invocations, auth failures)
Gauges: Point-in-time measurements (active connections, memory usage)
Histograms: Distribution of observations (latency percentiles, request sizes)
Summaries: Client-side percentile calculations (token usage, processing time)

For performance optimization strategies, see our ChatGPT app performance guide.

Custom Metric Collectors for MCP-Specific Insights

Beyond basic infrastructure metrics, MCP servers benefit from domain-specific collectors that track business logic performance:

// custom-collectors.ts - MCP-Specific Metric Collectors
import client from 'prom-client';

export class MCPMetricsCollector {
  private toolCallGraph: Map<string, Map<string, number>>;
  private userSessionMetrics: Map<string, SessionData>;

  constructor(private register: client.Registry) {
    this.toolCallGraph = new Map();
    this.userSessionMetrics = new Map();
    this.initializeCustomMetrics();
  }

  private initializeCustomMetrics() {
    // Tool composition patterns (which tools are called together)
    const toolComposition = new client.Counter({
      name: 'mcp_tool_composition_total',
      help: 'Tool invocation patterns (tool A → tool B)',
      labelNames: ['source_tool', 'target_tool', 'sequence_position'],
      registers: [this.register]
    });

    // Widget interaction metrics
    const widgetInteractions = new client.Counter({
      name: 'mcp_widget_interactions_total',
      help: 'User interactions with rendered widgets',
      labelNames: ['widget_type', 'interaction_type', 'outcome'],
      registers: [this.register]
    });

    // Session duration histogram
    const sessionDuration = new client.Histogram({
      name: 'mcp_session_duration_seconds',
      help: 'User session duration from first to last tool call',
      labelNames: ['user_tier', 'tools_used'],
      buckets: [60, 300, 600, 1800, 3600, 7200], // 1min to 2hr
      registers: [this.register]
    });

    // Error recovery success rate
    const errorRecovery = new client.Counter({
      name: 'mcp_error_recovery_total',
      help: 'Tool error recovery attempts',
      labelNames: ['error_type', 'recovery_strategy', 'success'],
      registers: [this.register]
    });

    // Payload size distribution (detect bloated responses)
    const payloadSize = new client.Histogram({
      name: 'mcp_payload_size_bytes',
      help: 'MCP response payload size in bytes',
      labelNames: ['tool_name', 'includes_widget'],
      buckets: [1024, 10240, 51200, 102400, 512000, 1048576], // 1KB to 1MB
      registers: [this.register]
    });

    // OAuth token validation metrics
    const tokenValidation = new client.Histogram({
      name: 'mcp_oauth_validation_seconds',
      help: 'OAuth token validation latency',
      labelNames: ['provider', 'cache_status'],
      buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1],
      registers: [this.register]
    });

    // Store for programmatic access
    this.metrics = {
      toolComposition,
      widgetInteractions,
      sessionDuration,
      errorRecovery,
      payloadSize,
      tokenValidation
    };
  }

  // Track tool call sequences (detect patterns like "search → filter → book")
  recordToolSequence(userId: string, toolName: string) {
    if (!this.toolCallGraph.has(userId)) {
      this.toolCallGraph.set(userId, new Map());
    }

    const userTools = this.toolCallGraph.get(userId)!;
    const lastTool = Array.from(userTools.keys()).pop();

    if (lastTool) {
      const position = userTools.size;
      this.metrics.toolComposition.inc({
        source_tool: lastTool,
        target_tool: toolName,
        sequence_position: String(position)
      });
    }

    userTools.set(toolName, Date.now());
  }

  // Track session metrics (duration, tool diversity)
  recordSessionEnd(userId: string, userTier: string) {
    const session = this.toolCallGraph.get(userId);
    if (!session) return;

    const toolNames = Array.from(session.keys());
    const startTime = Math.min(...Array.from(session.values()));
    const endTime = Math.max(...Array.from(session.values()));
    const durationSeconds = (endTime - startTime) / 1000;

    this.metrics.sessionDuration.observe({
      user_tier: userTier,
      tools_used: String(toolNames.length)
    }, durationSeconds);

    // Cleanup
    this.toolCallGraph.delete(userId);
  }

  // Track payload efficiency (detect over-fetching)
  recordPayloadSize(toolName: string, payloadBytes: number, hasWidget: boolean) {
    this.metrics.payloadSize.observe({
      tool_name: toolName,
      includes_widget: String(hasWidget)
    }, payloadBytes);
  }

  // Track error recovery effectiveness
  recordErrorRecovery(errorType: string, strategy: string, success: boolean) {
    this.metrics.errorRecovery.inc({
      error_type: errorType,
      recovery_strategy: strategy,
      success: String(success)
    });
  }

  private metrics: any;
}

// Usage example
const collector = new MCPMetricsCollector(register);

// In tool handler
async function handleToolCall(userId: string, toolName: string, params: any) {
  collector.recordToolSequence(userId, toolName);

  const result = await executeTool(toolName, params);
  const payloadSize = JSON.stringify(result).length;

  collector.recordPayloadSize(toolName, payloadSize, result.widget !== undefined);

  return result;
}

These custom collectors reveal insights invisible to standard infrastructure monitoring: tool usage patterns, session behavior, error recovery effectiveness, and payload optimization opportunities.

For error handling patterns, see our MCP server error recovery guide.

Grafana Dashboards: Visual Excellence

Raw metrics are valuable, but visualization transforms data into actionable insights. Grafana provides the industry-leading dashboarding platform for Prometheus data. Here's a production-ready dashboard configuration:

{
  "dashboard": {
    "title": "MCP Server Production Dashboard",
    "tags": ["mcp", "production", "monitoring"],
    "timezone": "browser",
    "refresh": "30s",
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "panels": [
      {
        "id": 1,
        "title": "Tool Invocation Rate (Requests/Second)",
        "type": "graph",
        "targets": [
          {
            "expr": "sum(rate(mcp_tool_invocations_total[5m])) by (tool_name)",
            "legendFormat": "{{ tool_name }}"
          }
        ],
        "yaxes": [
          { "format": "reqps", "label": "Requests/sec" }
        ],
        "alert": {
          "conditions": [
            {
              "evaluator": { "type": "gt", "params": [1000] },
              "operator": { "type": "and" },
              "query": { "params": ["A", "5m", "now"] },
              "reducer": { "type": "avg" }
            }
          ],
          "executionErrorState": "alerting",
          "frequency": "60s",
          "handler": 1,
          "name": "High Tool Invocation Rate",
          "noDataState": "no_data",
          "notifications": [{ "uid": "slack-alerts" }]
        }
      },
      {
        "id": 2,
        "title": "P95 Tool Latency (Response Time)",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, sum(rate(mcp_tool_execution_seconds_bucket[5m])) by (tool_name, le))",
            "legendFormat": "{{ tool_name }} P95"
          },
          {
            "expr": "histogram_quantile(0.99, sum(rate(mcp_tool_execution_seconds_bucket[5m])) by (tool_name, le))",
            "legendFormat": "{{ tool_name }} P99"
          }
        ],
        "yaxes": [
          { "format": "s", "label": "Latency" }
        ],
        "thresholds": [
          { "value": 2, "colorMode": "critical", "op": "gt", "fill": true, "line": true }
        ]
      },
      {
        "id": 3,
        "title": "Error Rate (%)",
        "type": "stat",
        "targets": [
          {
            "expr": "(sum(rate(mcp_tool_invocations_total{status='error'}[5m])) / sum(rate(mcp_tool_invocations_total[5m]))) * 100",
            "legendFormat": "Error Rate"
          }
        ],
        "options": {
          "graphMode": "area",
          "colorMode": "background",
          "thresholds": [
            { "value": 0, "color": "green" },
            { "value": 1, "color": "yellow" },
            { "value": 5, "color": "red" }
          ]
        }
      },
      {
        "id": 4,
        "title": "Active Connections",
        "type": "gauge",
        "targets": [
          {
            "expr": "sum(mcp_active_connections) by (transport_type)"
          }
        ],
        "options": {
          "showThresholdLabels": false,
          "showThresholdMarkers": true,
          "thresholds": [
            { "value": 0, "color": "red" },
            { "value": 50, "color": "yellow" },
            { "value": 100, "color": "green" }
          ]
        }
      },
      {
        "id": 5,
        "title": "Cache Hit Rate (%)",
        "type": "timeseries",
        "targets": [
          {
            "expr": "(sum(rate(mcp_cache_operations_total{result='hit'}[5m])) / sum(rate(mcp_cache_operations_total[5m]))) * 100",
            "legendFormat": "Cache Hit Rate"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "unit": "percent",
            "min": 0,
            "max": 100,
            "thresholds": {
              "steps": [
                { "value": 0, "color": "red" },
                { "value": 50, "color": "yellow" },
                { "value": 80, "color": "green" }
              ]
            }
          }
        }
      },
      {
        "id": 6,
        "title": "Token Usage Distribution",
        "type": "heatmap",
        "targets": [
          {
            "expr": "sum(rate(mcp_token_usage[5m])) by (tool_name, token_type)"
          }
        ],
        "options": {
          "calculate": true,
          "cellGap": 2,
          "yAxis": { "unit": "short" }
        }
      }
    ],
    "annotations": {
      "list": [
        {
          "name": "Deployments",
          "datasource": "Prometheus",
          "expr": "changes(mcp_nodejs_version_info[5m]) > 0",
          "iconColor": "blue",
          "enable": true
        }
      ]
    }
  }
}

Dashboard Design Best Practices:

Golden Signals First: Display latency, traffic, errors, and saturation prominently
Use Color Wisely: Green (healthy), yellow (warning), red (critical)—avoid unnecessary colors
Percentiles Over Averages: P95/P99 latency reveals user experience better than mean
Annotate Deployments: Correlate performance changes with code deployments
Mobile-Friendly: Design dashboards readable on phone screens for on-call engineers

For deployment strategies, see our MCP server deployment guide.

OpenTelemetry Distributed Tracing

Distributed tracing answers the critical question: "Where is time being spent in this request?" OpenTelemetry provides vendor-neutral instrumentation for tracing requests across services:

// opentelemetry-tracer.ts - Distributed Tracing for MCP Servers
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';
import { trace, context, SpanStatusCode } from '@opentelemetry/api';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';

// Configure tracer provider
const provider = new NodeTracerProvider({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'mcp-server',
    [SemanticResourceAttributes.SERVICE_VERSION]: process.env.APP_VERSION || '1.0.0',
    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'production'
  })
});

// Export traces to Jaeger
const jaegerExporter = new JaegerExporter({
  endpoint: process.env.JAEGER_ENDPOINT || 'http://localhost:14268/api/traces',
  tags: [
    { key: 'cluster', value: 'production' }
  ]
});

provider.addSpanProcessor(new BatchSpanProcessor(jaegerExporter));

// Auto-instrument HTTP and Express
provider.register();
new HttpInstrumentation().enable();
new ExpressInstrumentation().enable();

const tracer = trace.getTracer('mcp-server', '1.0.0');

// Trace MCP tool invocations
export async function traceToolCall<T>(
  toolName: string,
  params: any,
  handler: () => Promise<T>
): Promise<T> {
  return tracer.startActiveSpan(`tool.${toolName}`, async (span) => {
    // Add span attributes for filtering
    span.setAttribute('tool.name', toolName);
    span.setAttribute('tool.params', JSON.stringify(params));
    span.setAttribute('user.tier', context.active().getValue('userTier') as string);

    try {
      const startTime = Date.now();
      const result = await handler();
      const duration = Date.now() - startTime;

      // Record success
      span.setAttribute('tool.duration_ms', duration);
      span.setAttribute('tool.status', 'success');
      span.setStatus({ code: SpanStatusCode.OK });

      return result;
    } catch (error: any) {
      // Record failure
      span.setAttribute('tool.status', 'error');
      span.setAttribute('error.type', error.constructor.name);
      span.setAttribute('error.message', error.message);
      span.recordException(error);
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: error.message
      });

      throw error;
    } finally {
      span.end();
    }
  });
}

// Trace external API calls
export async function traceAPICall<T>(
  serviceName: string,
  operation: string,
  handler: () => Promise<T>
): Promise<T> {
  return tracer.startActiveSpan(`api.${serviceName}.${operation}`, async (span) => {
    span.setAttribute('service.name', serviceName);
    span.setAttribute('api.operation', operation);

    try {
      const result = await handler();
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (error: any) {
      span.recordException(error);
      span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
      throw error;
    } finally {
      span.end();
    }
  });
}

// Trace database queries
export async function traceDBQuery<T>(
  table: string,
  operation: string,
  handler: () => Promise<T>
): Promise<T> {
  return tracer.startActiveSpan(`db.${table}.${operation}`, async (span) => {
    span.setAttribute('db.table', table);
    span.setAttribute('db.operation', operation);

    try {
      const result = await handler();
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (error: any) {
      span.recordException(error);
      span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
      throw error;
    } finally {
      span.end();
    }
  });
}

// Usage example
async function handleSearchClasses(params: any) {
  return traceToolCall('searchClasses', params, async () => {
    // Nested span for database query
    const classes = await traceDBQuery('classes', 'SELECT', async () => {
      return db.query('SELECT * FROM classes WHERE date = ?', [params.date]);
    });

    // Nested span for external API call
    const availability = await traceAPICall('scheduling-service', 'checkAvailability', async () => {
      return fetch('https://api.scheduling.com/availability', {
        method: 'POST',
        body: JSON.stringify({ classIds: classes.map(c => c.id) })
      }).then(r => r.json());
    });

    return { classes, availability };
  });
}

Distributed tracing reveals the complete request lifecycle: tool invocation → database query → external API call → widget rendering. This visibility is essential for diagnosing performance bottlenecks in production systems.

For database integration patterns, see our MCP server database guide.

Structured Logging with Context Propagation

Metrics show what is happening, traces show where time is spent, and logs provide why something occurred. Structured logging with context propagation ties everything together:

// structured-logger.ts - Production Logging with OpenTelemetry Context
import winston from 'winston';
import { trace, context } from '@opentelemetry/api';

// Custom log format with trace context
const logFormat = winston.format.combine(
  winston.format.timestamp({ format: 'YYYY-MM-DD HH:mm:ss.SSS' }),
  winston.format.errors({ stack: true }),
  winston.format.printf((info) => {
    // Extract OpenTelemetry trace context
    const span = trace.getSpan(context.active());
    const traceId = span?.spanContext().traceId || 'no-trace';
    const spanId = span?.spanContext().spanId || 'no-span';

    return JSON.stringify({
      timestamp: info.timestamp,
      level: info.level,
      message: info.message,
      traceId,
      spanId,
      service: 'mcp-server',
      ...info.metadata
    });
  })
);

// Create logger instance
const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: logFormat,
  transports: [
    new winston.transports.Console(),
    new winston.transports.File({
      filename: 'logs/app.log',
      maxsize: 10485760, // 10MB
      maxFiles: 5
    })
  ]
});

// Contextual logging helpers
export const log = {
  info: (message: string, metadata = {}) => {
    logger.info(message, { metadata });
  },

  warn: (message: string, metadata = {}) => {
    logger.warn(message, { metadata });
  },

  error: (message: string, error: Error, metadata = {}) => {
    logger.error(message, {
      metadata: {
        ...metadata,
        error: {
          name: error.name,
          message: error.message,
          stack: error.stack
        }
      }
    });
  },

  // Log tool invocation with full context
  toolInvocation: (toolName: string, userId: string, params: any, result: any, durationMs: number) => {
    logger.info('MCP tool invoked', {
      metadata: {
        event: 'tool_invocation',
        tool_name: toolName,
        user_id: userId,
        params: JSON.stringify(params),
        result_status: result.success ? 'success' : 'error',
        duration_ms: durationMs,
        payload_size: JSON.stringify(result).length
      }
    });
  }
};

Structured logs enable powerful queries: "Show me all failed tool invocations for user X in the last hour" or "Find all requests where P95 latency exceeded 2 seconds."

For security monitoring, see our ChatGPT app security guide.

Alerting Strategies: Intelligent Notifications

Production monitoring without alerting creates false confidence. Implement intelligent alerts that notify the right people at the right time:

# alertmanager-config.yml - Production Alert Configuration
global:
  resolve_timeout: 5m
  slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'

route:
  group_by: ['alertname', 'severity', 'component']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'default'
  routes:
    # Critical alerts to PagerDuty + Slack
    - match:
        severity: critical
      receiver: 'pagerduty-critical'
      continue: true
    - match:
        severity: critical
      receiver: 'slack-critical'

    # Warning alerts to Slack only
    - match:
        severity: warning
      receiver: 'slack-warnings'

    # Performance degradation to dev team
    - match:
        component: performance
      receiver: 'slack-performance-team'

receivers:
  - name: 'pagerduty-critical'
    pagerduty_configs:
      - service_key: '<your-pagerduty-integration-key>'
        description: '{{ .CommonAnnotations.summary }}'
        severity: '{{ .CommonLabels.severity }}'

  - name: 'slack-critical'
    slack_configs:
      - channel: '#alerts-critical'
        title: '🚨 CRITICAL ALERT'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
        color: 'danger'

  - name: 'slack-warnings'
    slack_configs:
      - channel: '#alerts-warnings'
        title: '⚠️ Warning'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
        color: 'warning'

  - name: 'slack-performance-team'
    slack_configs:
      - channel: '#performance-team'
        title: '📊 Performance Alert'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

inhibit_rules:
  # Inhibit warning if critical alert is firing
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'component']

Alert Design Principles:

Actionable: Every alert must have a clear remediation path
Contextual: Include traceId, affected users, impacted tools in alert payload
Severity-Appropriate: Critical = revenue impact, Warning = degraded experience
Avoid Fatigue: Use inhibition rules to suppress redundant alerts
Escalation Paths: Route critical alerts to on-call engineers, warnings to Slack

For rate limiting strategies to prevent alert storms, see our MCP API rate limiting guide.

Production Observability Checklist

Before deploying your MCP server to production, ensure you have:

Metrics:

✅ Prometheus metrics exporter running on separate port (9090)
✅ Custom metrics for all critical tool operations
✅ Histogram buckets tuned to your expected latency distribution
✅ Label cardinality under control (avoid unbounded labels like user IDs)

Dashboards:

✅ Grafana dashboard with Golden Signals (latency, traffic, errors, saturation)
✅ Real-time refresh (30s or less for production dashboards)
✅ Alerts configured with appropriate thresholds
✅ Annotations for deployments and incidents

Tracing:

✅ OpenTelemetry instrumentation for all tool handlers
✅ Trace context propagation across service boundaries
✅ Sampling strategy (100% for errors, 1-10% for success in high traffic)
✅ Trace storage retention policy (7-30 days recommended)

Logging:

✅ Structured JSON logs with trace context
✅ Log aggregation (ELK, Datadog, or CloudWatch)
✅ Log rotation to prevent disk exhaustion
✅ Sensitive data sanitization (no passwords, tokens, PII in logs)

Alerting:

✅ Critical alerts routed to on-call engineers
✅ Runbooks linked from alert descriptions
✅ Alert fatigue prevention (inhibition rules, sensible thresholds)
✅ Escalation paths for unacknowledged alerts

Conclusion: Build Confidence Through Observability

Production ChatGPT apps demand world-class observability. Prometheus provides real-time metrics, Grafana transforms data into actionable insights, OpenTelemetry traces requests across distributed systems, and structured logging captures the "why" behind every event.

Implement these patterns before your app reaches scale. The difference between a failed launch and sustainable growth often comes down to how quickly you detect and resolve production issues. With proper observability, you catch errors before users notice, diagnose root causes in minutes instead of hours, and build confidence in your system's behavior.

Ready to deploy your MCP server with production-grade monitoring? MakeAIHQ provides enterprise observability out-of-the-box: pre-configured Prometheus metrics, Grafana dashboards, OpenTelemetry tracing, and intelligent alerting. Build ChatGPT apps that scale to millions of users without the ops overhead.

Start monitoring your MCP server today—because production isn't the place to discover you're flying blind.

Related Resources

MCP Server Development: Complete Guide - Master MCP protocol fundamentals
ChatGPT App Performance Optimization - Optimize latency and throughput
MCP Server Deployment Best Practices - Production deployment strategies
MCP Server Error Recovery Patterns - Handle failures gracefully
MCP API Rate Limiting Guide - Protect your server from overload
ChatGPT App Security Guide - Security best practices
MCP Server Database Integration - Database query patterns

External Resources:

Prometheus Documentation - Official Prometheus guide
Grafana Dashboards - Dashboard design best practices
OpenTelemetry Specification - Distributed tracing standards