MCP Server Error Handling Patterns for ChatGPT Apps

Building reliable ChatGPT apps requires robust error handling in your Model Context Protocol (MCP) servers. Unlike traditional APIs where users tolerate occasional failures, conversational AI applications demand resilience—users expect seamless interactions even when underlying services experience issues.

Error handling in MCP servers presents unique challenges. The model may retry failed tool calls, users continue conversations after errors, and long-running sessions accumulate failure scenarios. Without proper error handling patterns, your ChatGPT app will frustrate users with cryptic failures, timeout errors, and inconsistent behavior.

This guide covers production-ready error handling patterns for MCP servers: error classification, retry strategies, circuit breakers, graceful degradation, and structured error reporting. Each pattern includes complete TypeScript implementations tested in production ChatGPT apps. By implementing these patterns, you'll build resilient MCP servers that handle failures gracefully while maintaining excellent user experience.

Whether you're experiencing intermittent API failures, cascading errors across services, or poor error messages in ChatGPT responses, these patterns will transform your MCP server from fragile to fault-tolerant.

Error Classification

The foundation of effective error handling is distinguishing between error types. MCP servers encounter two fundamental error categories: transient errors (temporary issues that resolve on retry) and permanent errors (failures requiring immediate user intervention).

Transient errors include network timeouts, rate limits, temporary service unavailability, and database connection failures. These errors typically succeed when retried with appropriate backoff. Your error classifier should identify these failures and trigger retry logic automatically.

Permanent errors include authentication failures, invalid input data, resource not found errors, and quota exceeded errors. Retrying these errors wastes resources and delays user feedback. Your MCP server should detect permanent errors immediately and return helpful error messages.

Beyond binary classification, implement error severity levels: critical errors (service completely unavailable), warning errors (degraded performance possible), and info errors (non-blocking issues logged for monitoring). This granularity enables sophisticated error handling strategies.

Error metadata enriches error context for debugging and user communication. Capture error codes, original error messages, timestamp, affected resources, retry attempts, and request context. This metadata proves invaluable when diagnosing production issues or explaining failures to users.

Here's a production-ready error classifier for MCP servers:

// error-classifier.ts
import { ErrorCategory, ErrorSeverity, ClassifiedError } from './types';

interface ErrorContext {
  timestamp: Date;
  requestId: string;
  toolName: string;
  attemptNumber: number;
  originalError: Error;
}

interface ErrorClassification {
  category: ErrorCategory;
  severity: ErrorSeverity;
  isRetryable: boolean;
  retryAfterMs?: number;
  userMessage: string;
  internalMessage: string;
}

export class ErrorClassifier {
  private static readonly TRANSIENT_ERROR_CODES = new Set([
    'ETIMEDOUT',
    'ECONNRESET',
    'ECONNREFUSED',
    'ENOTFOUND',
    'ENETUNREACH',
    'EAI_AGAIN',
  ]);

  private static readonly RATE_LIMIT_CODES = new Set([
    'RATE_LIMIT_EXCEEDED',
    'TOO_MANY_REQUESTS',
    'QUOTA_EXCEEDED',
  ]);

  private static readonly AUTH_ERROR_CODES = new Set([
    'UNAUTHORIZED',
    'FORBIDDEN',
    'INVALID_TOKEN',
    'TOKEN_EXPIRED',
  ]);

  private static readonly NOT_FOUND_CODES = new Set([
    'NOT_FOUND',
    'RESOURCE_NOT_FOUND',
    'ENOENT',
  ]);

  public static classify(
    error: Error,
    context: ErrorContext
  ): ErrorClassification {
    const errorCode = this.extractErrorCode(error);
    const statusCode = this.extractStatusCode(error);

    // Check for transient network errors
    if (this.TRANSIENT_ERROR_CODES.has(errorCode)) {
      return {
        category: 'TRANSIENT',
        severity: 'WARNING',
        isRetryable: true,
        userMessage: 'Temporary network issue. Retrying...',
        internalMessage: `Network error: ${errorCode} on ${context.toolName}`,
      };
    }

    // Check for rate limit errors
    if (this.RATE_LIMIT_CODES.has(errorCode) || statusCode === 429) {
      const retryAfter = this.extractRetryAfter(error);
      return {
        category: 'RATE_LIMIT',
        severity: 'WARNING',
        isRetryable: true,
        retryAfterMs: retryAfter,
        userMessage: 'Service is busy. Retrying shortly...',
        internalMessage: `Rate limit exceeded on ${context.toolName}, retry after ${retryAfter}ms`,
      };
    }

    // Check for authentication errors
    if (this.AUTH_ERROR_CODES.has(errorCode) || statusCode === 401 || statusCode === 403) {
      return {
        category: 'AUTHENTICATION',
        severity: 'CRITICAL',
        isRetryable: false,
        userMessage: 'Authentication failed. Please reconnect your account.',
        internalMessage: `Auth failure: ${errorCode} on ${context.toolName}`,
      };
    }

    // Check for not found errors
    if (this.NOT_FOUND_CODES.has(errorCode) || statusCode === 404) {
      return {
        category: 'NOT_FOUND',
        severity: 'INFO',
        isRetryable: false,
        userMessage: 'The requested resource was not found.',
        internalMessage: `Resource not found: ${context.toolName} with ${errorCode}`,
      };
    }

    // Check for server errors (5xx)
    if (statusCode && statusCode >= 500 && statusCode < 600) {
      return {
        category: 'TRANSIENT',
        severity: 'CRITICAL',
        isRetryable: true,
        userMessage: 'Service temporarily unavailable. Retrying...',
        internalMessage: `Server error ${statusCode} on ${context.toolName}`,
      };
    }

    // Check for client errors (4xx)
    if (statusCode && statusCode >= 400 && statusCode < 500) {
      return {
        category: 'PERMANENT',
        severity: 'WARNING',
        isRetryable: false,
        userMessage: 'Invalid request. Please check your input.',
        internalMessage: `Client error ${statusCode}: ${error.message}`,
      };
    }

    // Default to permanent error for unknown failures
    return {
      category: 'PERMANENT',
      severity: 'CRITICAL',
      isRetryable: false,
      userMessage: 'An unexpected error occurred.',
      internalMessage: `Unclassified error: ${error.message}`,
    };
  }

  private static extractErrorCode(error: any): string {
    return error.code || error.errorCode || error.type || 'UNKNOWN';
  }

  private static extractStatusCode(error: any): number | undefined {
    return error.statusCode || error.status || error.response?.status;
  }

  private static extractRetryAfter(error: any): number {
    const retryAfter = error.retryAfter || error.response?.headers?.['retry-after'];
    if (typeof retryAfter === 'number') return retryAfter * 1000;
    if (typeof retryAfter === 'string') return parseInt(retryAfter, 10) * 1000;
    return 5000; // Default 5 seconds
  }
}

Retry Strategies

Once you've classified an error as transient, implement intelligent retry logic. Naive retry approaches (immediate retries without backoff) overload failing services and waste retry budgets. Production MCP servers require sophisticated retry strategies.

Exponential backoff prevents retry storms by increasing wait time between attempts. Start with a short delay (100ms), then double the wait time after each failure: 100ms, 200ms, 400ms, 800ms. This gives failing services time to recover while minimizing user wait time.

Jitter adds randomness to retry delays, preventing synchronized retry storms across multiple clients. Full jitter randomizes the entire backoff window; decorrelated jitter combines previous delay with random component for better distribution.

Retry budgets limit total retry attempts to prevent infinite retry loops. Set maximum retry counts (typically 3-5 attempts) and maximum total retry duration (10-30 seconds). Once budgets exhaust, fail gracefully with helpful error messages.

Idempotency ensures retries don't create duplicate actions. Assign unique idempotency keys to requests, store operation results keyed by idempotency tokens, and return cached results for duplicate requests. This makes retries safe for non-idempotent operations like payments or resource creation.

Here's a production retry handler with exponential backoff and jitter:

// retry-handler.ts
import { ErrorClassifier } from './error-classifier';

interface RetryConfig {
  maxAttempts: number;
  initialDelayMs: number;
  maxDelayMs: number;
  maxTotalTimeoutMs: number;
  jitterFactor: number; // 0.0 to 1.0
  retryableCategories: Set<string>;
}

interface RetryContext {
  requestId: string;
  toolName: string;
  startTime: Date;
  attemptNumber: number;
  totalDelayMs: number;
}

export class RetryHandler {
  private static readonly DEFAULT_CONFIG: RetryConfig = {
    maxAttempts: 5,
    initialDelayMs: 100,
    maxDelayMs: 10000,
    maxTotalTimeoutMs: 30000,
    jitterFactor: 0.5,
    retryableCategories: new Set(['TRANSIENT', 'RATE_LIMIT']),
  };

  constructor(private config: RetryConfig = RetryHandler.DEFAULT_CONFIG) {}

  public async executeWithRetry<T>(
    operation: () => Promise<T>,
    context: Omit<RetryContext, 'attemptNumber' | 'totalDelayMs'>
  ): Promise<T> {
    const fullContext: RetryContext = {
      ...context,
      attemptNumber: 1,
      totalDelayMs: 0,
    };

    return this.attemptOperation(operation, fullContext);
  }

  private async attemptOperation<T>(
    operation: () => Promise<T>,
    context: RetryContext
  ): Promise<T> {
    try {
      return await operation();
    } catch (error) {
      return this.handleError(error as Error, operation, context);
    }
  }

  private async handleError<T>(
    error: Error,
    operation: () => Promise<T>,
    context: RetryContext
  ): Promise<T> {
    const classification = ErrorClassifier.classify(error, {
      timestamp: new Date(),
      requestId: context.requestId,
      toolName: context.toolName,
      attemptNumber: context.attemptNumber,
      originalError: error,
    });

    // Check if error is retryable
    if (!this.shouldRetry(classification, context)) {
      throw this.createFinalError(error, classification, context);
    }

    // Calculate delay with exponential backoff and jitter
    const delayMs = this.calculateDelay(
      context.attemptNumber,
      classification.retryAfterMs
    );

    // Check if delay would exceed total timeout
    const elapsedMs = Date.now() - context.startTime.getTime();
    if (elapsedMs + delayMs + context.totalDelayMs > this.config.maxTotalTimeoutMs) {
      throw this.createTimeoutError(context);
    }

    // Wait before retry
    await this.sleep(delayMs);

    // Retry with incremented attempt number
    return this.attemptOperation(operation, {
      ...context,
      attemptNumber: context.attemptNumber + 1,
      totalDelayMs: context.totalDelayMs + delayMs,
    });
  }

  private shouldRetry(
    classification: any,
    context: RetryContext
  ): boolean {
    // Check retry budget
    if (context.attemptNumber >= this.config.maxAttempts) {
      return false;
    }

    // Check if error category is retryable
    if (!this.config.retryableCategories.has(classification.category)) {
      return false;
    }

    // Check explicit retry flag
    return classification.isRetryable;
  }

  private calculateDelay(attemptNumber: number, retryAfterMs?: number): number {
    // Use server-specified retry-after if provided
    if (retryAfterMs !== undefined) {
      return Math.min(retryAfterMs, this.config.maxDelayMs);
    }

    // Calculate exponential backoff: initialDelay * 2^(attempt - 1)
    const exponentialDelay = this.config.initialDelayMs * Math.pow(2, attemptNumber - 1);
    const cappedDelay = Math.min(exponentialDelay, this.config.maxDelayMs);

    // Apply decorrelated jitter
    const jitterRange = cappedDelay * this.config.jitterFactor;
    const jitter = Math.random() * jitterRange;

    return Math.floor(cappedDelay - jitterRange / 2 + jitter);
  }

  private createFinalError(
    originalError: Error,
    classification: any,
    context: RetryContext
  ): Error {
    const error = new Error(classification.userMessage);
    (error as any).code = 'RETRY_EXHAUSTED';
    (error as any).details = {
      originalError: classification.internalMessage,
      attempts: context.attemptNumber,
      totalDelayMs: context.totalDelayMs,
      requestId: context.requestId,
    };
    return error;
  }

  private createTimeoutError(context: RetryContext): Error {
    const error = new Error('Operation timed out after maximum retry duration');
    (error as any).code = 'RETRY_TIMEOUT';
    (error as any).details = {
      attempts: context.attemptNumber,
      totalDelayMs: context.totalDelayMs,
      maxTimeoutMs: this.config.maxTotalTimeoutMs,
      requestId: context.requestId,
    };
    return error;
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Circuit Breaker Pattern

Circuit breakers prevent cascading failures by stopping requests to failing services. When a service experiences repeated failures, the circuit breaker "opens" and rejects requests immediately instead of attempting doomed operations. This protects both the failing service (giving it time to recover) and your MCP server (avoiding wasted retries and timeouts).

Circuit states define breaker behavior: CLOSED (normal operation, requests pass through), OPEN (service failing, requests rejected immediately), and HALF_OPEN (testing if service recovered, limited requests allowed). State transitions occur based on failure thresholds and timeout durations.

Failure thresholds determine when circuits open. Track failure rate (percentage of failed requests), consecutive failures (number of sequential errors), and error volume (minimum requests before opening). Typical production values: open after 50% failure rate across 10+ requests or 5 consecutive failures.

Recovery strategies allow services to recover gracefully. After circuit opens, wait a timeout period (10-30 seconds), then enter HALF_OPEN state. Allow limited test requests; if they succeed, close the circuit. If they fail, reopen and extend timeout (exponential backoff for circuit recovery).

Here's a production circuit breaker implementation:

// circuit-breaker.ts
interface CircuitBreakerConfig {
  failureThreshold: number; // Percentage (0-100)
  consecutiveFailureThreshold: number;
  minimumRequestVolume: number;
  resetTimeoutMs: number;
  halfOpenMaxAttempts: number;
  slidingWindowMs: number;
}

enum CircuitState {
  CLOSED = 'CLOSED',
  OPEN = 'OPEN',
  HALF_OPEN = 'HALF_OPEN',
}

interface RequestRecord {
  timestamp: number;
  success: boolean;
}

export class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private requestHistory: RequestRecord[] = [];
  private consecutiveFailures: number = 0;
  private halfOpenAttempts: number = 0;
  private stateChangedAt: number = Date.now();
  private resetTimeoutHandle?: NodeJS.Timeout;

  private static readonly DEFAULT_CONFIG: CircuitBreakerConfig = {
    failureThreshold: 50,
    consecutiveFailureThreshold: 5,
    minimumRequestVolume: 10,
    resetTimeoutMs: 30000,
    halfOpenMaxAttempts: 3,
    slidingWindowMs: 60000,
  };

  constructor(
    private serviceName: string,
    private config: CircuitBreakerConfig = CircuitBreaker.DEFAULT_CONFIG
  ) {}

  public async execute<T>(operation: () => Promise<T>): Promise<T> {
    // Check circuit state before execution
    if (this.state === CircuitState.OPEN) {
      throw this.createCircuitOpenError();
    }

    // Limit concurrent requests in HALF_OPEN state
    if (this.state === CircuitState.HALF_OPEN) {
      if (this.halfOpenAttempts >= this.config.halfOpenMaxAttempts) {
        throw this.createCircuitOpenError();
      }
      this.halfOpenAttempts++;
    }

    try {
      const result = await operation();
      this.recordSuccess();
      return result;
    } catch (error) {
      this.recordFailure();
      throw error;
    }
  }

  private recordSuccess(): void {
    this.addRequestRecord(true);
    this.consecutiveFailures = 0;

    // Transition from HALF_OPEN to CLOSED on success
    if (this.state === CircuitState.HALF_OPEN) {
      this.transitionToClosed();
    }
  }

  private recordFailure(): void {
    this.addRequestRecord(false);
    this.consecutiveFailures++;

    // Check if circuit should open
    if (this.shouldOpenCircuit()) {
      this.transitionToOpen();
    }
  }

  private addRequestRecord(success: boolean): void {
    const now = Date.now();
    this.requestHistory.push({ timestamp: now, success });

    // Clean old records outside sliding window
    const windowStart = now - this.config.slidingWindowMs;
    this.requestHistory = this.requestHistory.filter(
      record => record.timestamp >= windowStart
    );
  }

  private shouldOpenCircuit(): boolean {
    // Check consecutive failure threshold
    if (this.consecutiveFailures >= this.config.consecutiveFailureThreshold) {
      return true;
    }

    // Check failure rate threshold (requires minimum request volume)
    if (this.requestHistory.length < this.config.minimumRequestVolume) {
      return false;
    }

    const failureCount = this.requestHistory.filter(r => !r.success).length;
    const failureRate = (failureCount / this.requestHistory.length) * 100;

    return failureRate >= this.config.failureThreshold;
  }

  private transitionToOpen(): void {
    if (this.state === CircuitState.OPEN) return;

    console.warn(`Circuit breaker opening for service: ${this.serviceName}`);
    this.state = CircuitState.OPEN;
    this.stateChangedAt = Date.now();
    this.halfOpenAttempts = 0;

    // Schedule transition to HALF_OPEN
    this.resetTimeoutHandle = setTimeout(() => {
      this.transitionToHalfOpen();
    }, this.config.resetTimeoutMs);
  }

  private transitionToHalfOpen(): void {
    console.info(`Circuit breaker entering HALF_OPEN for service: ${this.serviceName}`);
    this.state = CircuitState.HALF_OPEN;
    this.stateChangedAt = Date.now();
    this.halfOpenAttempts = 0;
  }

  private transitionToClosed(): void {
    console.info(`Circuit breaker closing for service: ${this.serviceName}`);
    this.state = CircuitState.CLOSED;
    this.stateChangedAt = Date.now();
    this.consecutiveFailures = 0;
    this.halfOpenAttempts = 0;
    this.requestHistory = [];

    if (this.resetTimeoutHandle) {
      clearTimeout(this.resetTimeoutHandle);
      this.resetTimeoutHandle = undefined;
    }
  }

  private createCircuitOpenError(): Error {
    const error = new Error(
      `Circuit breaker is OPEN for ${this.serviceName}. Service temporarily unavailable.`
    );
    (error as any).code = 'CIRCUIT_BREAKER_OPEN';
    (error as any).details = {
      serviceName: this.serviceName,
      state: this.state,
      openedAt: this.stateChangedAt,
      resetAfterMs: this.config.resetTimeoutMs - (Date.now() - this.stateChangedAt),
    };
    return error;
  }

  public getState(): CircuitState {
    return this.state;
  }

  public getMetrics() {
    const totalRequests = this.requestHistory.length;
    const failures = this.requestHistory.filter(r => !r.success).length;
    const successes = totalRequests - failures;

    return {
      state: this.state,
      totalRequests,
      successes,
      failures,
      failureRate: totalRequests > 0 ? (failures / totalRequests) * 100 : 0,
      consecutiveFailures: this.consecutiveFailures,
      stateChangedAt: this.stateChangedAt,
    };
  }
}

Graceful Degradation

Even with retries and circuit breakers, some requests will fail. Graceful degradation ensures your MCP server continues providing value during partial failures. Instead of complete failures, return partial results, cached data, or helpful fallback responses.

Fallback responses provide reasonable defaults when primary operations fail. For data retrieval failures, return cached results with staleness indicators. For write operations, queue requests for later processing. For complex workflows, complete partial steps and inform users about incomplete operations.

Partial results deliver available data even when some components fail. If fetching user profile fails but fetching user activity succeeds, return activity data with a note about unavailable profile. ChatGPT handles partial information well, often providing useful responses despite incomplete data.

Cached responses serve recent data when live APIs fail. Implement TTL-based caching for frequently accessed data, mark cached responses with freshness metadata, and automatically refresh stale cache entries. Users prefer slightly stale data over complete failures.

Here's graceful degradation middleware for MCP servers:

// graceful-degradation.ts
interface CacheEntry<T> {
  data: T;
  timestamp: number;
  ttlMs: number;
}

interface FallbackConfig<T> {
  enableCache: boolean;
  cacheTtlMs: number;
  fallbackData?: T;
  partialResultsAllowed: boolean;
}

export class GracefulDegradationMiddleware {
  private cache = new Map<string, CacheEntry<any>>();

  public async executeWithFallback<T>(
    cacheKey: string,
    operation: () => Promise<T>,
    config: FallbackConfig<T>
  ): Promise<{ data: T; metadata: any }> {
    try {
      // Attempt primary operation
      const result = await operation();

      // Cache successful result
      if (config.enableCache) {
        this.setCacheEntry(cacheKey, result, config.cacheTtlMs);
      }

      return {
        data: result,
        metadata: {
          source: 'primary',
          cached: false,
          degraded: false,
        },
      };
    } catch (error) {
      // Try cached data first
      if (config.enableCache) {
        const cached = this.getCacheEntry<T>(cacheKey);
        if (cached) {
          const ageMs = Date.now() - cached.timestamp;
          const isStale = ageMs > cached.ttlMs;

          return {
            data: cached.data,
            metadata: {
              source: 'cache',
              cached: true,
              degraded: true,
              cacheAgeMs: ageMs,
              isStale,
              originalError: (error as Error).message,
            },
          };
        }
      }

      // Fall back to default data
      if (config.fallbackData !== undefined) {
        return {
          data: config.fallbackData,
          metadata: {
            source: 'fallback',
            cached: false,
            degraded: true,
            originalError: (error as Error).message,
          },
        };
      }

      // No fallback available, throw original error
      throw error;
    }
  }

  public async executePartialOperation<T extends Record<string, any>>(
    operations: Record<keyof T, () => Promise<any>>,
    config: { allowPartialResults: boolean }
  ): Promise<{ data: Partial<T>; metadata: any }> {
    const results: Partial<T> = {};
    const errors: Record<string, string> = {};
    const successes: string[] = [];

    // Execute all operations in parallel
    const entries = Object.entries(operations) as [keyof T, () => Promise<any>][];
    await Promise.allSettled(
      entries.map(async ([key, operation]) => {
        try {
          results[key] = await operation();
          successes.push(String(key));
        } catch (error) {
          errors[String(key)] = (error as Error).message;
        }
      })
    );

    const totalOperations = entries.length;
    const successfulOperations = successes.length;
    const isPartial = successfulOperations < totalOperations;

    // If partial results not allowed and any operation failed, throw
    if (!config.allowPartialResults && isPartial) {
      const error = new Error('Partial operation failure');
      (error as any).code = 'PARTIAL_FAILURE';
      (error as any).details = { errors, successes };
      throw error;
    }

    return {
      data: results,
      metadata: {
        partial: isPartial,
        successCount: successfulOperations,
        totalCount: totalOperations,
        successRate: (successfulOperations / totalOperations) * 100,
        failures: errors,
        successes,
      },
    };
  }

  private setCacheEntry<T>(key: string, data: T, ttlMs: number): void {
    this.cache.set(key, {
      data,
      timestamp: Date.now(),
      ttlMs,
    });
  }

  private getCacheEntry<T>(key: string): CacheEntry<T> | undefined {
    return this.cache.get(key);
  }

  public clearCache(pattern?: RegExp): void {
    if (!pattern) {
      this.cache.clear();
      return;
    }

    for (const key of this.cache.keys()) {
      if (pattern.test(key)) {
        this.cache.delete(key);
      }
    }
  }

  public getCacheMetrics() {
    const entries = Array.from(this.cache.entries());
    const now = Date.now();

    return {
      totalEntries: entries.length,
      staleEntries: entries.filter(([_, e]) => now - e.timestamp > e.ttlMs).length,
      cacheKeys: entries.map(([key]) => key),
    };
  }
}

Error Reporting

User-facing error messages require careful crafting. ChatGPT amplifies error quality—clear, actionable errors enable helpful model responses, while cryptic errors generate confused or incorrect guidance. Structure your MCP server errors for both technical debugging and user communication.

Structured errors separate technical details from user messages. Include error code (machine-readable identifier), user message (friendly explanation), internal message (technical details), context (request metadata), and remediation (suggested next steps). This structure supports both debugging and user communication.

Error context captures environmental information crucial for diagnosing production issues. Record request ID (for log correlation), timestamp, user ID, tool name, input parameters (sanitized), and stack traces. Balance detail against privacy—avoid logging sensitive data.

User-friendly messages transform technical errors into actionable guidance. Replace "ECONNREFUSED" with "Unable to connect to the service." Transform "401 Unauthorized" into "Please reconnect your account in settings." Suggest specific remediation steps: "Try again in a few moments" or "Contact support with request ID: abc123."

Here's an error context builder and formatter:

// error-context.ts
interface ErrorContext {
  requestId: string;
  timestamp: Date;
  userId?: string;
  toolName: string;
  parameters?: Record<string, any>;
  attemptNumber: number;
}

interface StructuredError {
  code: string;
  userMessage: string;
  internalMessage: string;
  context: ErrorContext;
  remediation?: string;
  severity: 'INFO' | 'WARNING' | 'CRITICAL';
  isRetryable: boolean;
}

export class ErrorContextBuilder {
  private context: Partial<ErrorContext> = {};

  public setRequestId(requestId: string): this {
    this.context.requestId = requestId;
    return this;
  }

  public setUserId(userId: string): this {
    this.context.userId = userId;
    return this;
  }

  public setToolName(toolName: string): this {
    this.context.toolName = toolName;
    return this;
  }

  public setParameters(parameters: Record<string, any>): this {
    this.context.parameters = this.sanitizeParameters(parameters);
    return this;
  }

  public setAttemptNumber(attemptNumber: number): this {
    this.context.attemptNumber = attemptNumber;
    return this;
  }

  public build(): ErrorContext {
    return {
      requestId: this.context.requestId || this.generateRequestId(),
      timestamp: new Date(),
      userId: this.context.userId,
      toolName: this.context.toolName || 'unknown',
      parameters: this.context.parameters,
      attemptNumber: this.context.attemptNumber || 1,
    };
  }

  private sanitizeParameters(params: Record<string, any>): Record<string, any> {
    const sanitized: Record<string, any> = {};
    const sensitiveKeys = new Set(['password', 'token', 'apiKey', 'secret', 'credential']);

    for (const [key, value] of Object.entries(params)) {
      if (sensitiveKeys.has(key)) {
        sanitized[key] = '[REDACTED]';
      } else if (typeof value === 'string' && value.length > 200) {
        sanitized[key] = value.substring(0, 200) + '...[TRUNCATED]';
      } else {
        sanitized[key] = value;
      }
    }

    return sanitized;
  }

  private generateRequestId(): string {
    return `req_${Date.now()}_${Math.random().toString(36).substring(2, 9)}`;
  }
}

export class StructuredErrorFormatter {
  public static formatForChatGPT(error: StructuredError): string {
    let message = error.userMessage;

    // Add remediation if available
    if (error.remediation) {
      message += `\n\n${error.remediation}`;
    }

    // Add request ID for support
    if (error.severity === 'CRITICAL') {
      message += `\n\nRequest ID: ${error.context.requestId}`;
    }

    return message;
  }

  public static formatForLogging(error: StructuredError): string {
    return JSON.stringify({
      code: error.code,
      severity: error.severity,
      message: error.internalMessage,
      userMessage: error.userMessage,
      context: error.context,
      isRetryable: error.isRetryable,
    }, null, 2);
  }

  public static createStructuredError(
    error: Error,
    context: ErrorContext,
    classification: any
  ): StructuredError {
    return {
      code: this.extractErrorCode(error),
      userMessage: classification.userMessage,
      internalMessage: classification.internalMessage,
      context,
      remediation: this.generateRemediation(classification),
      severity: classification.severity,
      isRetryable: classification.isRetryable,
    };
  }

  private static extractErrorCode(error: any): string {
    return error.code || error.errorCode || error.name || 'UNKNOWN_ERROR';
  }

  private static generateRemediation(classification: any): string | undefined {
    switch (classification.category) {
      case 'AUTHENTICATION':
        return 'Please reconnect your account in the app settings.';
      case 'RATE_LIMIT':
        return 'The service is experiencing high demand. Please try again in a few moments.';
      case 'NOT_FOUND':
        return 'Please verify the resource exists and you have access to it.';
      case 'TRANSIENT':
        return classification.isRetryable
          ? 'We\'re automatically retrying this request.'
          : 'Please try your request again.';
      default:
        return undefined;
    }
  }
}

Finally, here's an idempotency key manager for safe retries:

// idempotency-manager.ts
interface IdempotencyRecord<T> {
  key: string;
  result: T;
  timestamp: number;
  ttlMs: number;
}

export class IdempotencyManager {
  private records = new Map<string, IdempotencyRecord<any>>();
  private readonly DEFAULT_TTL_MS = 3600000; // 1 hour

  public async executeIdempotent<T>(
    idempotencyKey: string,
    operation: () => Promise<T>,
    ttlMs: number = this.DEFAULT_TTL_MS
  ): Promise<{ result: T; wasRetry: boolean }> {
    // Check for existing result
    const existing = this.getRecord<T>(idempotencyKey);
    if (existing) {
      return {
        result: existing.result,
        wasRetry: true,
      };
    }

    // Execute operation
    const result = await operation();

    // Store result
    this.storeRecord(idempotencyKey, result, ttlMs);

    return {
      result,
      wasRetry: false,
    };
  }

  public generateKey(
    toolName: string,
    parameters: Record<string, any>,
    userId?: string
  ): string {
    const components = [
      toolName,
      userId || 'anonymous',
      this.hashParameters(parameters),
    ];

    return components.join(':');
  }

  private getRecord<T>(key: string): IdempotencyRecord<T> | undefined {
    const record = this.records.get(key);
    if (!record) return undefined;

    // Check if record expired
    const age = Date.now() - record.timestamp;
    if (age > record.ttlMs) {
      this.records.delete(key);
      return undefined;
    }

    return record;
  }

  private storeRecord<T>(key: string, result: T, ttlMs: number): void {
    this.records.set(key, {
      key,
      result,
      timestamp: Date.now(),
      ttlMs,
    });
  }

  private hashParameters(params: Record<string, any>): string {
    const sorted = Object.keys(params)
      .sort()
      .reduce((acc, key) => {
        acc[key] = params[key];
        return acc;
      }, {} as Record<string, any>);

    return Buffer.from(JSON.stringify(sorted)).toString('base64');
  }

  public clearExpiredRecords(): void {
    const now = Date.now();
    for (const [key, record] of this.records.entries()) {
      if (now - record.timestamp > record.ttlMs) {
        this.records.delete(key);
      }
    }
  }

  public getMetrics() {
    const records = Array.from(this.records.values());
    const now = Date.now();

    return {
      totalRecords: records.length,
      activeRecords: records.filter(r => now - r.timestamp <= r.ttlMs).length,
      expiredRecords: records.filter(r => now - r.timestamp > r.ttlMs).length,
    };
  }
}

Build Resilient ChatGPT Apps with MakeAIHQ

Implementing comprehensive error handling transforms your MCP servers from fragile prototypes into production-ready services. Error classification routes failures appropriately, retry strategies handle transient issues automatically, circuit breakers prevent cascading failures, graceful degradation maintains functionality during partial outages, and structured error reporting enables effective debugging and user communication.

These patterns work together synergistically. Classify errors before retrying, open circuit breakers after retry budgets exhaust, serve cached data when circuits open, and log structured context throughout. Production MCP servers combine all five patterns for true resilience.

MakeAIHQ handles error handling complexity automatically. Our no-code platform generates production-ready MCP servers with built-in retry logic, circuit breakers, graceful degradation, and comprehensive error reporting. Focus on your business logic while we manage reliability patterns, monitoring, and infrastructure.

Ready to build fault-tolerant ChatGPT apps without writing error handling code? Start your free trial and deploy your first resilient MCP server in minutes. Check out our Complete Guide to Building ChatGPT Applications for comprehensive coverage of ChatGPT app development patterns, or explore our MCP Server Debugging Guide for troubleshooting production issues.

For implementation details on specific patterns, see our guides on Circuit Breaker Pattern for ChatGPT Apps and Retry Strategies for ChatGPT Applications.


Further Reading