Blue-Green Deployment for ChatGPT Apps: Zero-Downtime Strategy

Deploying ChatGPT apps to production requires a strategy that minimizes downtime and risk. Blue-green deployment is the gold standard for achieving zero-downtime releases, allowing you to switch between two identical production environments instantly. This comprehensive guide shows you how to implement blue-green deployment for ChatGPT apps using Kubernetes, with complete code examples for traffic switching, health checks, automated rollback, and database migrations.

Whether you're building a ChatGPT app for 1,000 users or 1 million, blue-green deployment ensures your updates are seamless, reversible, and production-safe. You'll learn the exact infrastructure configuration, deployment scripts, and monitoring strategies used by enterprise teams to ship ChatGPT apps with confidence.

If you're looking for a platform that handles deployment complexity automatically, MakeAIHQ's no-code ChatGPT builder includes built-in blue-green deployment, health checks, and rollback—no DevOps expertise required. But if you're building custom infrastructure, this guide gives you everything you need.

Blue-Green Architecture Fundamentals

Blue-green deployment maintains two identical production environments: Blue (currently serving traffic) and Green (staging the next release). The strategy is simple but powerful:

  1. Blue environment serves 100% of production traffic
  2. Deploy new version to Green environment (zero user impact)
  3. Run smoke tests and health checks on Green
  4. Switch load balancer from Blue to Green (instant cutover)
  5. Monitor Green; if issues arise, switch back to Blue (instant rollback)
  6. Keep Blue as standby for 24-48 hours, then update it with the new version

Dual Environment Setup

For ChatGPT apps running on Kubernetes, you maintain two separate deployments sharing the same infrastructure:

  • Namespace isolation: chatgpt-blue and chatgpt-green namespaces
  • Shared resources: Database, Redis cache, storage buckets (read-only during switch)
  • Separate compute: Independent pods, resource quotas, horizontal autoscaling
  • Single ingress: Load balancer switches between blue/green services via labels

The key advantage: your new version is fully deployed and tested before any user sees it. Compare this to rolling updates (gradual pod replacement) or canary deployments (partial traffic shifting)—blue-green gives you binary control with instant rollback capability.

Load Balancer Configuration

Your load balancer is the traffic controller. It must support:

  • Service discovery: Dynamically route to blue or green backend based on labels
  • Health checks: Remove unhealthy pods from rotation automatically
  • Session affinity: Maintain user sessions during switches (if stateful)
  • Connection draining: Gracefully close existing connections before switching

For ChatGPT apps, this typically means:

  • Kubernetes Service with label selectors (version: blue or version: green)
  • Ingress controller (NGINX, Traefik, or cloud provider) with backend switching
  • External load balancer (AWS ALB, GCP Load Balancer) if multi-cluster

Database Strategy Considerations

Database handling is the trickiest part of blue-green deployment. You have three options:

Option 1: Shared Database (Most Common)

  • Both blue and green use the same database
  • Requires backward-compatible schema migrations
  • New code must work with old schema (before switch) and new schema (after switch)
  • Example: Adding a column with a default value, then using it in Green

Option 2: Database Per Environment

  • Blue and Green each have separate databases
  • Sync data from Blue to Green before switch
  • Complete isolation, but complex data synchronization
  • Best for read-heavy apps with eventual consistency tolerance

Option 3: Database Blue-Green

  • Use database replication (e.g., PostgreSQL streaming replication)
  • Blue writes to primary; Green reads from replica
  • Promote replica to primary during switch
  • High complexity, best for database-critical apps

For most ChatGPT apps, Option 1 (shared database with backward-compatible migrations) is the right choice. We'll show you how to implement it safely below.

Kubernetes Implementation

Here's a production-ready Kubernetes configuration for blue-green ChatGPT app deployment. This example assumes your app runs as a Node.js MCP server with a React widget frontend.

Kubernetes Blue-Green Services (YAML)

# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: chatgpt-app-blue
  namespace: chatgpt-production
  labels:
    app: chatgpt-app
    version: blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: chatgpt-app
      version: blue
  template:
    metadata:
      labels:
        app: chatgpt-app
        version: blue
    spec:
      containers:
      - name: mcp-server
        image: your-registry/chatgpt-app:v1.2.3
        ports:
        - containerPort: 3000
          name: http
        env:
        - name: ENVIRONMENT
          value: "blue"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: url
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: openai-credentials
              key: api-key
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 2
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: chatgpt-app-green
  namespace: chatgpt-production
  labels:
    app: chatgpt-app
    version: green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: chatgpt-app
      version: green
  template:
    metadata:
      labels:
        app: chatgpt-app
        version: green
    spec:
      containers:
      - name: mcp-server
        image: your-registry/chatgpt-app:v1.3.0  # New version
        ports:
        - containerPort: 3000
          name: http
        env:
        - name: ENVIRONMENT
          value: "green"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: url
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: openai-credentials
              key: api-key
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 2
---
# service.yaml (Active Service - switches between blue/green)
apiVersion: v1
kind: Service
metadata:
  name: chatgpt-app-service
  namespace: chatgpt-production
  labels:
    app: chatgpt-app
spec:
  selector:
    app: chatgpt-app
    version: blue  # CHANGE THIS TO 'green' TO SWITCH TRAFFIC
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
  type: ClusterIP
---
# blue-service.yaml (Direct access for testing)
apiVersion: v1
kind: Service
metadata:
  name: chatgpt-app-blue
  namespace: chatgpt-production
spec:
  selector:
    app: chatgpt-app
    version: blue
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
  type: ClusterIP
---
# green-service.yaml (Direct access for testing)
apiVersion: v1
kind: Service
metadata:
  name: chatgpt-app-green
  namespace: chatgpt-production
spec:
  selector:
    app: chatgpt-app
    version: green
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
  type: ClusterIP
---
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: chatgpt-app-ingress
  namespace: chatgpt-production
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.yourdomain.com
    secretName: chatgpt-app-tls
  rules:
  - host: app.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: chatgpt-app-service  # Main service (blue or green)
            port:
              number: 80
      - path: /blue  # Direct blue access for smoke tests
        pathType: Prefix
        backend:
          service:
            name: chatgpt-app-blue
            port:
              number: 80
      - path: /green  # Direct green access for smoke tests
        pathType: Prefix
        backend:
          service:
            name: chatgpt-app-green
            port:
              number: 80

Key Features:

  • Label-based switching: Change version: blue to version: green in the main service
  • Separate blue/green services: Test each environment independently via /blue and /green paths
  • Health checks: Liveness (restart unhealthy pods) and readiness (remove from load balancer)
  • Resource limits: Prevent resource starvation during deployments

To deploy this configuration, you'll need kubectl access and the image already pushed to your container registry. Learn how to build ChatGPT app containers in our ChatGPT App Docker Guide.

Traffic Switching Strategy

Switching traffic from blue to green is a multi-stage process that prioritizes safety over speed. Here's the production workflow:

Deployment Switcher Script (Bash)

#!/bin/bash
# blue-green-switch.sh - Safe traffic switching for ChatGPT apps

set -euo pipefail

NAMESPACE="chatgpt-production"
SERVICE_NAME="chatgpt-app-service"
CURRENT_VERSION=""
TARGET_VERSION=""
HEALTH_CHECK_URL=""
SMOKE_TEST_TIMEOUT=300  # 5 minutes

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

log_info() {
    echo -e "${GREEN}[INFO]${NC} $1"
}

log_warn() {
    echo -e "${YELLOW}[WARN]${NC} $1"
}

log_error() {
    echo -e "${RED}[ERROR]${NC} $1"
}

# Detect current active version
detect_current_version() {
    log_info "Detecting current active version..."
    CURRENT_VERSION=$(kubectl get service "$SERVICE_NAME" -n "$NAMESPACE" -o jsonpath='{.spec.selector.version}')

    if [[ "$CURRENT_VERSION" == "blue" ]]; then
        TARGET_VERSION="green"
    elif [[ "$CURRENT_VERSION" == "green" ]]; then
        TARGET_VERSION="blue"
    else
        log_error "Unknown current version: $CURRENT_VERSION"
        exit 1
    fi

    log_info "Current: $CURRENT_VERSION | Target: $TARGET_VERSION"
}

# Verify target deployment is ready
verify_target_ready() {
    log_info "Verifying $TARGET_VERSION deployment is ready..."

    local ready_replicas=$(kubectl get deployment "chatgpt-app-$TARGET_VERSION" -n "$NAMESPACE" -o jsonpath='{.status.readyReplicas}')
    local desired_replicas=$(kubectl get deployment "chatgpt-app-$TARGET_VERSION" -n "$NAMESPACE" -o jsonpath='{.spec.replicas}')

    if [[ "$ready_replicas" != "$desired_replicas" ]]; then
        log_error "$TARGET_VERSION deployment not ready: $ready_replicas/$desired_replicas pods ready"
        exit 1
    fi

    log_info "$TARGET_VERSION deployment ready: $ready_replicas/$desired_replicas pods"
}

# Run health checks on target environment
run_health_checks() {
    log_info "Running health checks on $TARGET_VERSION..."

    # Get a pod from target deployment
    local pod=$(kubectl get pods -n "$NAMESPACE" -l "app=chatgpt-app,version=$TARGET_VERSION" -o jsonpath='{.items[0].metadata.name}')

    if [[ -z "$pod" ]]; then
        log_error "No pods found for $TARGET_VERSION deployment"
        exit 1
    fi

    # Health check
    if ! kubectl exec "$pod" -n "$NAMESPACE" -- curl -sf http://localhost:3000/health > /dev/null; then
        log_error "Health check failed for $TARGET_VERSION"
        exit 1
    fi

    # Readiness check
    if ! kubectl exec "$pod" -n "$NAMESPACE" -- curl -sf http://localhost:3000/ready > /dev/null; then
        log_error "Readiness check failed for $TARGET_VERSION"
        exit 1
    fi

    log_info "Health checks passed for $TARGET_VERSION"
}

# Run smoke tests on target environment
run_smoke_tests() {
    log_info "Running smoke tests on $TARGET_VERSION (timeout: ${SMOKE_TEST_TIMEOUT}s)..."

    # Port-forward to green service for isolated testing
    kubectl port-forward -n "$NAMESPACE" "service/chatgpt-app-$TARGET_VERSION" 8080:80 &
    local port_forward_pid=$!

    sleep 5  # Wait for port-forward to establish

    # Run smoke test suite
    if timeout "$SMOKE_TEST_TIMEOUT" npm run test:smoke -- --base-url http://localhost:8080; then
        log_info "Smoke tests passed for $TARGET_VERSION"
    else
        log_error "Smoke tests failed for $TARGET_VERSION"
        kill "$port_forward_pid" 2>/dev/null || true
        exit 1
    fi

    # Clean up port-forward
    kill "$port_forward_pid" 2>/dev/null || true
}

# Switch traffic to target version
switch_traffic() {
    log_warn "Switching traffic from $CURRENT_VERSION to $TARGET_VERSION..."

    # Patch service selector
    kubectl patch service "$SERVICE_NAME" -n "$NAMESPACE" -p "{\"spec\":{\"selector\":{\"version\":\"$TARGET_VERSION\"}}}"

    log_info "Traffic switched to $TARGET_VERSION"
}

# Monitor target version for errors
monitor_target() {
    log_info "Monitoring $TARGET_VERSION for 60 seconds..."

    local start_time=$(date +%s)
    local error_count=0

    while [[ $(($(date +%s) - start_time)) -lt 60 ]]; do
        # Check pod restart count
        local restarts=$(kubectl get pods -n "$NAMESPACE" -l "app=chatgpt-app,version=$TARGET_VERSION" -o jsonpath='{.items[*].status.containerStatuses[0].restartCount}' | tr ' ' '\n' | awk '{s+=$1} END {print s}')

        if [[ "$restarts" -gt 0 ]]; then
            ((error_count++))
            log_warn "Detected $restarts pod restarts in $TARGET_VERSION"
        fi

        # Check for error logs
        if kubectl logs -n "$NAMESPACE" -l "app=chatgpt-app,version=$TARGET_VERSION" --tail=50 --since=10s | grep -i "error" > /dev/null; then
            ((error_count++))
            log_warn "Detected errors in $TARGET_VERSION logs"
        fi

        if [[ "$error_count" -gt 5 ]]; then
            log_error "Too many errors detected in $TARGET_VERSION. Initiating rollback..."
            rollback_traffic
            exit 1
        fi

        sleep 10
    done

    log_info "Monitoring complete. $TARGET_VERSION is stable."
}

# Rollback traffic to previous version
rollback_traffic() {
    log_error "Rolling back traffic to $CURRENT_VERSION..."

    kubectl patch service "$SERVICE_NAME" -n "$NAMESPACE" -p "{\"spec\":{\"selector\":{\"version\":\"$CURRENT_VERSION\"}}}"

    log_info "Traffic rolled back to $CURRENT_VERSION"
}

# Main execution
main() {
    log_info "Starting blue-green deployment switch..."

    detect_current_version
    verify_target_ready
    run_health_checks
    run_smoke_tests
    switch_traffic
    monitor_target

    log_info "Blue-green deployment switch complete!"
    log_info "Active version: $TARGET_VERSION"
    log_info "Standby version: $CURRENT_VERSION (keep for 24-48h before updating)"
}

main "$@"

This script automates the entire switching process with safety checks at every stage. If any check fails, deployment stops before traffic is affected. For a manual approach to deployment validation, see our ChatGPT App Testing Guide.

Health Check Prober (TypeScript)

Your ChatGPT app must expose health and readiness endpoints. Here's a production implementation:

// health-check.ts - Health and readiness endpoints for MCP server
import express, { Request, Response } from 'express';
import { Client } from 'pg';  // PostgreSQL client
import Redis from 'ioredis';

interface HealthStatus {
  status: 'healthy' | 'unhealthy';
  timestamp: string;
  version: string;
  environment: string;
  checks: {
    database: boolean;
    redis: boolean;
    openai: boolean;
  };
  uptime: number;
}

const router = express.Router();
const startTime = Date.now();

// Database connection pool
const dbClient = new Client({
  connectionString: process.env.DATABASE_URL,
  connectionTimeoutMillis: 5000,
});

// Redis client
const redisClient = new Redis(process.env.REDIS_URL || 'redis://localhost:6379', {
  connectTimeout: 5000,
  maxRetriesPerRequest: 2,
});

// Health check (liveness probe) - Am I alive?
router.get('/health', async (req: Request, res: Response) => {
  const health: HealthStatus = {
    status: 'healthy',
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || 'unknown',
    environment: process.env.ENVIRONMENT || 'unknown',
    checks: {
      database: false,
      redis: false,
      openai: false,
    },
    uptime: Date.now() - startTime,
  };

  try {
    // Check database connectivity
    await dbClient.query('SELECT 1');
    health.checks.database = true;
  } catch (error) {
    console.error('Database health check failed:', error);
    health.status = 'unhealthy';
  }

  try {
    // Check Redis connectivity
    await redisClient.ping();
    health.checks.redis = true;
  } catch (error) {
    console.error('Redis health check failed:', error);
    health.status = 'unhealthy';
  }

  try {
    // Check OpenAI API availability (lightweight check)
    const response = await fetch('https://api.openai.com/v1/models', {
      method: 'GET',
      headers: {
        'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
      },
      signal: AbortSignal.timeout(3000),
    });

    health.checks.openai = response.ok;
  } catch (error) {
    console.error('OpenAI health check failed:', error);
    // Don't mark unhealthy for OpenAI failures (might be temporary)
  }

  const statusCode = health.status === 'healthy' ? 200 : 503;
  res.status(statusCode).json(health);
});

// Readiness check (readiness probe) - Can I handle traffic?
router.get('/ready', async (req: Request, res: Response) => {
  // Stricter checks for readiness
  const checks = {
    database: false,
    redis: false,
    openai: false,
    dbMigrations: false,
  };

  try {
    // Database connectivity
    await dbClient.query('SELECT 1');
    checks.database = true;

    // Check if migrations are up to date
    const migrationCheck = await dbClient.query(
      "SELECT EXISTS (SELECT 1 FROM information_schema.tables WHERE table_name = 'schema_migrations')"
    );
    checks.dbMigrations = migrationCheck.rows[0].exists;
  } catch (error) {
    console.error('Database readiness check failed:', error);
    return res.status(503).json({ ready: false, reason: 'database_unavailable', checks });
  }

  try {
    // Redis connectivity
    await redisClient.ping();
    checks.redis = true;
  } catch (error) {
    console.error('Redis readiness check failed:', error);
    return res.status(503).json({ ready: false, reason: 'redis_unavailable', checks });
  }

  try {
    // OpenAI API availability
    const response = await fetch('https://api.openai.com/v1/models', {
      method: 'GET',
      headers: {
        'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
      },
      signal: AbortSignal.timeout(3000),
    });

    checks.openai = response.ok;
  } catch (error) {
    console.error('OpenAI readiness check failed:', error);
    return res.status(503).json({ ready: false, reason: 'openai_unavailable', checks });
  }

  // All checks passed
  res.status(200).json({
    ready: true,
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION || 'unknown',
    checks,
  });
});

export default router;

Liveness vs Readiness:

  • Liveness: "Is my app running?" (Kubernetes restarts pod if this fails)
  • Readiness: "Can I handle traffic?" (Kubernetes removes from load balancer if this fails)

During blue-green switch, Kubernetes uses readiness checks to ensure Green pods are ready before switching traffic. For database migration patterns that work with blue-green deployments, see our ChatGPT App Database Migrations Guide.

Rollback Strategies

Even with thorough testing, production issues can emerge after traffic switches to Green. Your rollback strategy must be instant and reliable.

Automated Rollback (Bash)

#!/bin/bash
# auto-rollback.sh - Automated rollback based on error thresholds

set -euo pipefail

NAMESPACE="chatgpt-production"
SERVICE_NAME="chatgpt-app-service"
MONITOR_DURATION=600  # Monitor for 10 minutes after switch
ERROR_THRESHOLD=10    # Max acceptable errors before rollback
CHECK_INTERVAL=30     # Check every 30 seconds

ACTIVE_VERSION=""
PREVIOUS_VERSION=""
ERROR_COUNT=0

log_info() { echo -e "\033[0;32m[INFO]\033[0m $1"; }
log_error() { echo -e "\033[0;31m[ERROR]\033[0m $1"; }

# Get current active version
get_active_version() {
    ACTIVE_VERSION=$(kubectl get service "$SERVICE_NAME" -n "$NAMESPACE" -o jsonpath='{.spec.selector.version}')

    if [[ "$ACTIVE_VERSION" == "blue" ]]; then
        PREVIOUS_VERSION="green"
    else
        PREVIOUS_VERSION="blue"
    fi

    log_info "Monitoring active version: $ACTIVE_VERSION (previous: $PREVIOUS_VERSION)"
}

# Check for pod crashes
check_pod_crashes() {
    local crashes=$(kubectl get pods -n "$NAMESPACE" -l "app=chatgpt-app,version=$ACTIVE_VERSION" \
        -o jsonpath='{.items[*].status.containerStatuses[0].restartCount}' | \
        tr ' ' '\n' | awk '{s+=$1} END {print s}')

    if [[ "$crashes" -gt 0 ]]; then
        ((ERROR_COUNT += crashes))
        log_error "Detected $crashes pod crashes in $ACTIVE_VERSION (total errors: $ERROR_COUNT)"
    fi
}

# Check error logs
check_error_logs() {
    local errors=$(kubectl logs -n "$NAMESPACE" -l "app=chatgpt-app,version=$ACTIVE_VERSION" \
        --tail=100 --since="${CHECK_INTERVAL}s" 2>/dev/null | \
        grep -c "ERROR\|FATAL\|Exception" || true)

    if [[ "$errors" -gt 5 ]]; then
        ((ERROR_COUNT += errors))
        log_error "Detected $errors error log entries in $ACTIVE_VERSION (total errors: $ERROR_COUNT)"
    fi
}

# Check response times (requires metrics-server)
check_response_times() {
    # This requires Prometheus/metrics-server to be deployed
    # Simplified example - adapt to your monitoring stack

    local avg_response_time=$(kubectl exec -n "$NAMESPACE" \
        $(kubectl get pods -n "$NAMESPACE" -l "app=chatgpt-app,version=$ACTIVE_VERSION" -o jsonpath='{.items[0].metadata.name}') -- \
        curl -s http://localhost:3000/metrics | grep "http_request_duration_seconds" | awk '{print $2}' || echo "0")

    # If average response time > 5 seconds, increment error count
    if (( $(echo "$avg_response_time > 5" | bc -l) )); then
        ((ERROR_COUNT += 3))
        log_error "High response time detected: ${avg_response_time}s (total errors: $ERROR_COUNT)"
    fi
}

# Execute rollback
execute_rollback() {
    log_error "ERROR THRESHOLD EXCEEDED ($ERROR_COUNT errors). Rolling back to $PREVIOUS_VERSION..."

    kubectl patch service "$SERVICE_NAME" -n "$NAMESPACE" \
        -p "{\"spec\":{\"selector\":{\"version\":\"$PREVIOUS_VERSION\"}}}"

    log_info "Rollback complete. Active version: $PREVIOUS_VERSION"

    # Send alert (integrate with PagerDuty, Slack, etc.)
    curl -X POST "https://hooks.slack.com/services/YOUR/WEBHOOK/URL" \
        -H "Content-Type: application/json" \
        -d "{\"text\":\"🚨 Auto-rollback executed: $ACTIVE_VERSION → $PREVIOUS_VERSION (Errors: $ERROR_COUNT)\"}"

    exit 1
}

# Main monitoring loop
monitor() {
    local elapsed=0

    while [[ "$elapsed" -lt "$MONITOR_DURATION" ]]; do
        check_pod_crashes
        check_error_logs
        check_response_times

        if [[ "$ERROR_COUNT" -ge "$ERROR_THRESHOLD" ]]; then
            execute_rollback
        fi

        sleep "$CHECK_INTERVAL"
        ((elapsed += CHECK_INTERVAL))

        log_info "Monitoring progress: ${elapsed}s / ${MONITOR_DURATION}s (Errors: $ERROR_COUNT)"
    done

    log_info "Monitoring complete. $ACTIVE_VERSION is stable (Total errors: $ERROR_COUNT)"
}

# Entry point
main() {
    get_active_version
    monitor
}

main "$@"

Run this script immediately after switching to Green. It continuously monitors for errors and automatically rolls back if thresholds are exceeded. For production incident management, see our ChatGPT App Incident Response Guide.

Manual Rollback Procedure

If you need to manually rollback (e.g., during off-hours), it's a single command:

# Rollback from green to blue
kubectl patch service chatgpt-app-service -n chatgpt-production \
  -p '{"spec":{"selector":{"version":"blue"}}}'

# Verify switch
kubectl get service chatgpt-app-service -n chatgpt-production -o yaml | grep version

Rollback takes effect within seconds—users on existing connections finish their requests, new connections go to Blue immediately.

Database Migrations with Blue-Green

Database migrations are the Achilles' heel of blue-green deployment. You can't simply deploy breaking schema changes because Blue and Green share the same database.

Database Migration Manager (TypeScript)

// migration-manager.ts - Backward-compatible database migrations
import { Pool } from 'pg';

interface Migration {
  id: string;
  name: string;
  up: (db: Pool) => Promise<void>;
  down: (db: Pool) => Promise<void>;
}

class MigrationManager {
  private db: Pool;

  constructor(dbUrl: string) {
    this.db = new Pool({ connectionString: dbUrl });
  }

  // Run migration in backward-compatible way
  async runMigration(migration: Migration): Promise<void> {
    console.log(`Running migration: ${migration.name}`);

    try {
      await this.db.query('BEGIN');

      // Execute migration
      await migration.up(this.db);

      // Record migration
      await this.db.query(
        `INSERT INTO schema_migrations (id, name, applied_at) VALUES ($1, $2, NOW())`,
        [migration.id, migration.name]
      );

      await this.db.query('COMMIT');
      console.log(`Migration ${migration.name} completed successfully`);
    } catch (error) {
      await this.db.query('ROLLBACK');
      console.error(`Migration ${migration.name} failed:`, error);
      throw error;
    }
  }

  // Example: Add column with default value (backward compatible)
  async addColumnWithDefault(): Promise<void> {
    const migration: Migration = {
      id: '2026-12-25-001',
      name: 'add_user_tier_column',
      up: async (db) => {
        // Step 1: Add column with default value (safe for Blue version)
        await db.query(`
          ALTER TABLE users
          ADD COLUMN IF NOT EXISTS tier VARCHAR(50) DEFAULT 'free'
        `);

        // Step 2: Backfill existing rows (optional, depends on data volume)
        await db.query(`
          UPDATE users
          SET tier = 'free'
          WHERE tier IS NULL
        `);

        // Step 3: Add index (safe to do before Green deployment)
        await db.query(`
          CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_tier ON users(tier)
        `);
      },
      down: async (db) => {
        await db.query(`DROP INDEX IF EXISTS idx_users_tier`);
        await db.query(`ALTER TABLE users DROP COLUMN IF EXISTS tier`);
      },
    };

    await this.runMigration(migration);
  }

  // Example: Rename column (requires expand-contract pattern)
  async renameColumn(): Promise<void> {
    // PHASE 1: Add new column (run BEFORE deploying Green)
    const phase1: Migration = {
      id: '2026-12-25-002a',
      name: 'add_email_address_column',
      up: async (db) => {
        // Add new column
        await db.query(`
          ALTER TABLE users
          ADD COLUMN IF NOT EXISTS email_address VARCHAR(255)
        `);

        // Copy data from old column to new column
        await db.query(`
          UPDATE users
          SET email_address = email
          WHERE email_address IS NULL
        `);
      },
      down: async (db) => {
        await db.query(`ALTER TABLE users DROP COLUMN IF EXISTS email_address`);
      },
    };

    await this.runMigration(phase1);

    // PHASE 2: Update application code to use new column (Green deployment)
    // (Green reads/writes email_address, Blue still uses email)

    // PHASE 3: Remove old column (run AFTER Blue is updated to match Green)
    // DO NOT RUN IMMEDIATELY - wait 24-48 hours after traffic switch
    const phase3: Migration = {
      id: '2026-12-25-002c',
      name: 'remove_email_column',
      up: async (db) => {
        await db.query(`ALTER TABLE users DROP COLUMN IF EXISTS email`);
      },
      down: async (db) => {
        // Irreversible - old column is gone
        throw new Error('Cannot rollback: old email column has been dropped');
      },
    };

    // Uncomment after 24-48 hours and Blue is updated
    // await this.runMigration(phase3);
  }

  async close(): Promise<void> {
    await this.db.end();
  }
}

// Usage example
async function main() {
  const migrator = new MigrationManager(process.env.DATABASE_URL!);

  try {
    // Run backward-compatible migrations BEFORE deploying Green
    await migrator.addColumnWithDefault();

    console.log('Migrations completed successfully');
  } catch (error) {
    console.error('Migration failed:', error);
    process.exit(1);
  } finally {
    await migrator.close();
  }
}

if (require.main === module) {
  main();
}

Three-Phase Migration Pattern

For breaking changes (like renaming columns), use the expand-contract pattern:

  1. Expand (before Green deployment): Add new column, backfill data
  2. Migrate (Green deployment): Application uses new column, writes to both old and new
  3. Contract (after Blue is updated): Remove old column

This ensures both Blue and Green work with the database at all times. For advanced database strategies, see our ChatGPT App Multi-Tenancy Guide.

Additional Production Tools

Traffic Controller (TypeScript)

For gradual traffic migration (hybrid blue-green + canary), you can implement weighted routing:

// traffic-controller.ts - Gradual traffic shifting
import k8s from '@kubernetes/client-node';

interface TrafficWeight {
  blue: number;
  green: number;
}

class TrafficController {
  private k8sApi: k8s.CoreV1Api;
  private namespace: string;

  constructor(namespace: string = 'chatgpt-production') {
    const kc = new k8s.KubeConfig();
    kc.loadFromDefault();
    this.k8sApi = kc.makeApiClient(k8s.CoreV1Api);
    this.namespace = namespace;
  }

  // Gradually shift traffic from blue to green
  async gradualShift(durationMinutes: number = 30): Promise<void> {
    const steps = 10;
    const intervalMs = (durationMinutes * 60 * 1000) / steps;

    for (let i = 0; i <= steps; i++) {
      const greenPercentage = (i / steps) * 100;
      const bluePercentage = 100 - greenPercentage;

      console.log(`Shifting traffic: Blue ${bluePercentage}% → Green ${greenPercentage}%`);

      await this.setTrafficWeights({
        blue: bluePercentage,
        green: greenPercentage,
      });

      // Monitor for errors between shifts
      await this.sleep(intervalMs);

      const healthCheck = await this.checkHealth('green');
      if (!healthCheck.healthy) {
        console.error('Green environment unhealthy. Aborting shift.');
        await this.setTrafficWeights({ blue: 100, green: 0 });
        throw new Error('Traffic shift aborted due to health check failure');
      }
    }

    console.log('Gradual traffic shift complete. Green: 100%');
  }

  private async setTrafficWeights(weights: TrafficWeight): Promise<void> {
    // This requires an ingress controller that supports weighted routing
    // Example using NGINX Ingress with canary annotations

    const ingressName = 'chatgpt-app-ingress';

    try {
      const { body: ingress } = await this.k8sApi.readNamespacedService(
        ingressName,
        this.namespace
      );

      // Update annotations for canary routing
      ingress.metadata!.annotations = {
        ...ingress.metadata!.annotations,
        'nginx.ingress.kubernetes.io/canary': 'true',
        'nginx.ingress.kubernetes.io/canary-weight': weights.green.toString(),
      };

      await this.k8sApi.replaceNamespacedService(
        ingressName,
        this.namespace,
        ingress
      );

      console.log(`Traffic weights updated: Blue ${weights.blue}%, Green ${weights.green}%`);
    } catch (error) {
      console.error('Failed to update traffic weights:', error);
      throw error;
    }
  }

  private async checkHealth(version: string): Promise<{ healthy: boolean }> {
    // Port-forward to service and check /health endpoint
    // Simplified example
    return { healthy: true };
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Usage
async function main() {
  const controller = new TrafficController();

  try {
    await controller.gradualShift(30);  // Shift over 30 minutes
  } catch (error) {
    console.error('Traffic shift failed:', error);
    process.exit(1);
  }
}

if (require.main === module) {
  main();
}

Smoke Test Suite (TypeScript)

// smoke-tests.ts - Essential post-deployment tests
import { expect } from 'chai';
import fetch from 'node-fetch';

const BASE_URL = process.env.BASE_URL || 'http://localhost:8080';

describe('ChatGPT App Smoke Tests', () => {
  it('should return healthy status', async () => {
    const response = await fetch(`${BASE_URL}/health`);
    const data = await response.json();

    expect(response.status).to.equal(200);
    expect(data.status).to.equal('healthy');
    expect(data.checks.database).to.be.true;
    expect(data.checks.redis).to.be.true;
  });

  it('should serve MCP metadata', async () => {
    const response = await fetch(`${BASE_URL}/mcp`);
    const data = await response.json();

    expect(response.status).to.equal(200);
    expect(data.name).to.exist;
    expect(data.version).to.exist;
    expect(data.tools).to.be.an('array');
  });

  it('should execute MCP tool successfully', async () => {
    const response = await fetch(`${BASE_URL}/mcp`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        jsonrpc: '2.0',
        method: 'tools/call',
        params: {
          name: 'get_user_info',
          arguments: { userId: 'test-user-123' },
        },
        id: 1,
      }),
    });

    const data = await response.json();

    expect(response.status).to.equal(200);
    expect(data.result).to.exist;
  });

  it('should render widget template', async () => {
    const response = await fetch(`${BASE_URL}/widget`);
    const html = await response.text();

    expect(response.status).to.equal(200);
    expect(html).to.include('window.openai');
    expect(html).to.include('text/html+skybridge');
  });

  it('should handle database queries', async () => {
    const response = await fetch(`${BASE_URL}/api/users/test-user-123`);
    const data = await response.json();

    expect(response.status).to.be.oneOf([200, 404]);  // User may not exist in staging
    if (response.status === 200) {
      expect(data.id).to.equal('test-user-123');
    }
  });

  it('should have acceptable response times', async () => {
    const start = Date.now();
    await fetch(`${BASE_URL}/health`);
    const duration = Date.now() - start;

    expect(duration).to.be.lessThan(1000);  // < 1 second
  });
});

Run smoke tests against the Green environment before switching traffic. If any test fails, abort the deployment. For complete testing strategies, see our ChatGPT App E2E Testing Guide.

CI/CD Pipeline (GitHub Actions YAML)

# .github/workflows/blue-green-deploy.yml
name: Blue-Green Deployment

on:
  push:
    branches:
      - main

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  KUBE_NAMESPACE: chatgpt-production

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix={{branch}}-
            type=ref,event=branch

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy-green:
    needs: build
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure kubectl
        uses: azure/k8s-set-context@v3
        with:
          method: kubeconfig
          kubeconfig: ${{ secrets.KUBE_CONFIG }}

      - name: Determine active version
        id: detect
        run: |
          ACTIVE=$(kubectl get service chatgpt-app-service -n ${{ env.KUBE_NAMESPACE }} -o jsonpath='{.spec.selector.version}')
          if [ "$ACTIVE" = "blue" ]; then
            echo "target=green" >> $GITHUB_OUTPUT
          else
            echo "target=blue" >> $GITHUB_OUTPUT
          fi

      - name: Update deployment image
        run: |
          kubectl set image deployment/chatgpt-app-${{ steps.detect.outputs.target }} \
            mcp-server=${{ needs.build.outputs.image-tag }} \
            -n ${{ env.KUBE_NAMESPACE }}

      - name: Wait for rollout
        run: |
          kubectl rollout status deployment/chatgpt-app-${{ steps.detect.outputs.target }} \
            -n ${{ env.KUBE_NAMESPACE }} \
            --timeout=5m

      - name: Run smoke tests
        run: |
          kubectl port-forward -n ${{ env.KUBE_NAMESPACE }} \
            service/chatgpt-app-${{ steps.detect.outputs.target }} 8080:80 &
          sleep 5
          npm run test:smoke -- --base-url http://localhost:8080

  switch-traffic:
    needs: deploy-green
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://app.yourdomain.com

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure kubectl
        uses: azure/k8s-set-context@v3
        with:
          method: kubeconfig
          kubeconfig: ${{ secrets.KUBE_CONFIG }}

      - name: Switch traffic
        run: |
          chmod +x ./scripts/blue-green-switch.sh
          ./scripts/blue-green-switch.sh

  monitor:
    needs: switch-traffic
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure kubectl
        uses: azure/k8s-set-context@v3
        with:
          method: kubeconfig
          kubeconfig: ${{ secrets.KUBE_CONFIG }}

      - name: Monitor deployment
        run: |
          chmod +x ./scripts/auto-rollback.sh
          ./scripts/auto-rollback.sh

This GitHub Actions workflow automates the entire blue-green deployment: build image → deploy to Green → run smoke tests → switch traffic → monitor for errors. For more CI/CD patterns, see our ChatGPT App DevOps Guide.

Conclusion: Production-Ready Blue-Green Deployment

Blue-green deployment is the industry standard for zero-downtime releases. By maintaining two identical production environments and switching traffic instantly, you eliminate deployment risk and give yourself an instant rollback option.

The strategies in this guide—Kubernetes blue-green services, automated health checks, smoke testing, backward-compatible database migrations, and automated monitoring—are battle-tested patterns used by teams deploying ChatGPT apps at scale.

Key Takeaways:

  • Deploy to Green environment while Blue serves traffic (zero user impact)
  • Run comprehensive smoke tests before switching traffic
  • Use label-based Kubernetes services for instant traffic switching
  • Implement backward-compatible database migrations (expand-contract pattern)
  • Monitor Green environment continuously; rollback if errors exceed threshold
  • Keep Blue as standby for 24-48 hours after successful switch

If managing this infrastructure feels overwhelming, MakeAIHQ handles blue-green deployment automatically—you focus on building ChatGPT apps, we handle the DevOps complexity. Build, test, and deploy ChatGPT apps with built-in zero-downtime deployment, health monitoring, and instant rollback. Start your free trial today.

For more deployment strategies, explore our guides on canary deployments, feature flags, and disaster recovery.