Blue-Green Deployment for ChatGPT Apps: Zero-Downtime Strategy
Deploying ChatGPT apps to production requires a strategy that minimizes downtime and risk. Blue-green deployment is the gold standard for achieving zero-downtime releases, allowing you to switch between two identical production environments instantly. This comprehensive guide shows you how to implement blue-green deployment for ChatGPT apps using Kubernetes, with complete code examples for traffic switching, health checks, automated rollback, and database migrations.
Whether you're building a ChatGPT app for 1,000 users or 1 million, blue-green deployment ensures your updates are seamless, reversible, and production-safe. You'll learn the exact infrastructure configuration, deployment scripts, and monitoring strategies used by enterprise teams to ship ChatGPT apps with confidence.
If you're looking for a platform that handles deployment complexity automatically, MakeAIHQ's no-code ChatGPT builder includes built-in blue-green deployment, health checks, and rollback—no DevOps expertise required. But if you're building custom infrastructure, this guide gives you everything you need.
Blue-Green Architecture Fundamentals
Blue-green deployment maintains two identical production environments: Blue (currently serving traffic) and Green (staging the next release). The strategy is simple but powerful:
- Blue environment serves 100% of production traffic
- Deploy new version to Green environment (zero user impact)
- Run smoke tests and health checks on Green
- Switch load balancer from Blue to Green (instant cutover)
- Monitor Green; if issues arise, switch back to Blue (instant rollback)
- Keep Blue as standby for 24-48 hours, then update it with the new version
Dual Environment Setup
For ChatGPT apps running on Kubernetes, you maintain two separate deployments sharing the same infrastructure:
- Namespace isolation:
chatgpt-blueandchatgpt-greennamespaces - Shared resources: Database, Redis cache, storage buckets (read-only during switch)
- Separate compute: Independent pods, resource quotas, horizontal autoscaling
- Single ingress: Load balancer switches between blue/green services via labels
The key advantage: your new version is fully deployed and tested before any user sees it. Compare this to rolling updates (gradual pod replacement) or canary deployments (partial traffic shifting)—blue-green gives you binary control with instant rollback capability.
Load Balancer Configuration
Your load balancer is the traffic controller. It must support:
- Service discovery: Dynamically route to blue or green backend based on labels
- Health checks: Remove unhealthy pods from rotation automatically
- Session affinity: Maintain user sessions during switches (if stateful)
- Connection draining: Gracefully close existing connections before switching
For ChatGPT apps, this typically means:
- Kubernetes Service with label selectors (
version: blueorversion: green) - Ingress controller (NGINX, Traefik, or cloud provider) with backend switching
- External load balancer (AWS ALB, GCP Load Balancer) if multi-cluster
Database Strategy Considerations
Database handling is the trickiest part of blue-green deployment. You have three options:
Option 1: Shared Database (Most Common)
- Both blue and green use the same database
- Requires backward-compatible schema migrations
- New code must work with old schema (before switch) and new schema (after switch)
- Example: Adding a column with a default value, then using it in Green
Option 2: Database Per Environment
- Blue and Green each have separate databases
- Sync data from Blue to Green before switch
- Complete isolation, but complex data synchronization
- Best for read-heavy apps with eventual consistency tolerance
Option 3: Database Blue-Green
- Use database replication (e.g., PostgreSQL streaming replication)
- Blue writes to primary; Green reads from replica
- Promote replica to primary during switch
- High complexity, best for database-critical apps
For most ChatGPT apps, Option 1 (shared database with backward-compatible migrations) is the right choice. We'll show you how to implement it safely below.
Kubernetes Implementation
Here's a production-ready Kubernetes configuration for blue-green ChatGPT app deployment. This example assumes your app runs as a Node.js MCP server with a React widget frontend.
Kubernetes Blue-Green Services (YAML)
# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: chatgpt-app-blue
namespace: chatgpt-production
labels:
app: chatgpt-app
version: blue
spec:
replicas: 3
selector:
matchLabels:
app: chatgpt-app
version: blue
template:
metadata:
labels:
app: chatgpt-app
version: blue
spec:
containers:
- name: mcp-server
image: your-registry/chatgpt-app:v1.2.3
ports:
- containerPort: 3000
name: http
env:
- name: ENVIRONMENT
value: "blue"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-credentials
key: api-key
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: chatgpt-app-green
namespace: chatgpt-production
labels:
app: chatgpt-app
version: green
spec:
replicas: 3
selector:
matchLabels:
app: chatgpt-app
version: green
template:
metadata:
labels:
app: chatgpt-app
version: green
spec:
containers:
- name: mcp-server
image: your-registry/chatgpt-app:v1.3.0 # New version
ports:
- containerPort: 3000
name: http
env:
- name: ENVIRONMENT
value: "green"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-credentials
key: api-key
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
---
# service.yaml (Active Service - switches between blue/green)
apiVersion: v1
kind: Service
metadata:
name: chatgpt-app-service
namespace: chatgpt-production
labels:
app: chatgpt-app
spec:
selector:
app: chatgpt-app
version: blue # CHANGE THIS TO 'green' TO SWITCH TRAFFIC
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: ClusterIP
---
# blue-service.yaml (Direct access for testing)
apiVersion: v1
kind: Service
metadata:
name: chatgpt-app-blue
namespace: chatgpt-production
spec:
selector:
app: chatgpt-app
version: blue
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: ClusterIP
---
# green-service.yaml (Direct access for testing)
apiVersion: v1
kind: Service
metadata:
name: chatgpt-app-green
namespace: chatgpt-production
spec:
selector:
app: chatgpt-app
version: green
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: ClusterIP
---
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: chatgpt-app-ingress
namespace: chatgpt-production
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- app.yourdomain.com
secretName: chatgpt-app-tls
rules:
- host: app.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: chatgpt-app-service # Main service (blue or green)
port:
number: 80
- path: /blue # Direct blue access for smoke tests
pathType: Prefix
backend:
service:
name: chatgpt-app-blue
port:
number: 80
- path: /green # Direct green access for smoke tests
pathType: Prefix
backend:
service:
name: chatgpt-app-green
port:
number: 80
Key Features:
- Label-based switching: Change
version: bluetoversion: greenin the main service - Separate blue/green services: Test each environment independently via
/blueand/greenpaths - Health checks: Liveness (restart unhealthy pods) and readiness (remove from load balancer)
- Resource limits: Prevent resource starvation during deployments
To deploy this configuration, you'll need kubectl access and the image already pushed to your container registry. Learn how to build ChatGPT app containers in our ChatGPT App Docker Guide.
Traffic Switching Strategy
Switching traffic from blue to green is a multi-stage process that prioritizes safety over speed. Here's the production workflow:
Deployment Switcher Script (Bash)
#!/bin/bash
# blue-green-switch.sh - Safe traffic switching for ChatGPT apps
set -euo pipefail
NAMESPACE="chatgpt-production"
SERVICE_NAME="chatgpt-app-service"
CURRENT_VERSION=""
TARGET_VERSION=""
HEALTH_CHECK_URL=""
SMOKE_TEST_TIMEOUT=300 # 5 minutes
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
log_info() {
echo -e "${GREEN}[INFO]${NC} $1"
}
log_warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# Detect current active version
detect_current_version() {
log_info "Detecting current active version..."
CURRENT_VERSION=$(kubectl get service "$SERVICE_NAME" -n "$NAMESPACE" -o jsonpath='{.spec.selector.version}')
if [[ "$CURRENT_VERSION" == "blue" ]]; then
TARGET_VERSION="green"
elif [[ "$CURRENT_VERSION" == "green" ]]; then
TARGET_VERSION="blue"
else
log_error "Unknown current version: $CURRENT_VERSION"
exit 1
fi
log_info "Current: $CURRENT_VERSION | Target: $TARGET_VERSION"
}
# Verify target deployment is ready
verify_target_ready() {
log_info "Verifying $TARGET_VERSION deployment is ready..."
local ready_replicas=$(kubectl get deployment "chatgpt-app-$TARGET_VERSION" -n "$NAMESPACE" -o jsonpath='{.status.readyReplicas}')
local desired_replicas=$(kubectl get deployment "chatgpt-app-$TARGET_VERSION" -n "$NAMESPACE" -o jsonpath='{.spec.replicas}')
if [[ "$ready_replicas" != "$desired_replicas" ]]; then
log_error "$TARGET_VERSION deployment not ready: $ready_replicas/$desired_replicas pods ready"
exit 1
fi
log_info "$TARGET_VERSION deployment ready: $ready_replicas/$desired_replicas pods"
}
# Run health checks on target environment
run_health_checks() {
log_info "Running health checks on $TARGET_VERSION..."
# Get a pod from target deployment
local pod=$(kubectl get pods -n "$NAMESPACE" -l "app=chatgpt-app,version=$TARGET_VERSION" -o jsonpath='{.items[0].metadata.name}')
if [[ -z "$pod" ]]; then
log_error "No pods found for $TARGET_VERSION deployment"
exit 1
fi
# Health check
if ! kubectl exec "$pod" -n "$NAMESPACE" -- curl -sf http://localhost:3000/health > /dev/null; then
log_error "Health check failed for $TARGET_VERSION"
exit 1
fi
# Readiness check
if ! kubectl exec "$pod" -n "$NAMESPACE" -- curl -sf http://localhost:3000/ready > /dev/null; then
log_error "Readiness check failed for $TARGET_VERSION"
exit 1
fi
log_info "Health checks passed for $TARGET_VERSION"
}
# Run smoke tests on target environment
run_smoke_tests() {
log_info "Running smoke tests on $TARGET_VERSION (timeout: ${SMOKE_TEST_TIMEOUT}s)..."
# Port-forward to green service for isolated testing
kubectl port-forward -n "$NAMESPACE" "service/chatgpt-app-$TARGET_VERSION" 8080:80 &
local port_forward_pid=$!
sleep 5 # Wait for port-forward to establish
# Run smoke test suite
if timeout "$SMOKE_TEST_TIMEOUT" npm run test:smoke -- --base-url http://localhost:8080; then
log_info "Smoke tests passed for $TARGET_VERSION"
else
log_error "Smoke tests failed for $TARGET_VERSION"
kill "$port_forward_pid" 2>/dev/null || true
exit 1
fi
# Clean up port-forward
kill "$port_forward_pid" 2>/dev/null || true
}
# Switch traffic to target version
switch_traffic() {
log_warn "Switching traffic from $CURRENT_VERSION to $TARGET_VERSION..."
# Patch service selector
kubectl patch service "$SERVICE_NAME" -n "$NAMESPACE" -p "{\"spec\":{\"selector\":{\"version\":\"$TARGET_VERSION\"}}}"
log_info "Traffic switched to $TARGET_VERSION"
}
# Monitor target version for errors
monitor_target() {
log_info "Monitoring $TARGET_VERSION for 60 seconds..."
local start_time=$(date +%s)
local error_count=0
while [[ $(($(date +%s) - start_time)) -lt 60 ]]; do
# Check pod restart count
local restarts=$(kubectl get pods -n "$NAMESPACE" -l "app=chatgpt-app,version=$TARGET_VERSION" -o jsonpath='{.items[*].status.containerStatuses[0].restartCount}' | tr ' ' '\n' | awk '{s+=$1} END {print s}')
if [[ "$restarts" -gt 0 ]]; then
((error_count++))
log_warn "Detected $restarts pod restarts in $TARGET_VERSION"
fi
# Check for error logs
if kubectl logs -n "$NAMESPACE" -l "app=chatgpt-app,version=$TARGET_VERSION" --tail=50 --since=10s | grep -i "error" > /dev/null; then
((error_count++))
log_warn "Detected errors in $TARGET_VERSION logs"
fi
if [[ "$error_count" -gt 5 ]]; then
log_error "Too many errors detected in $TARGET_VERSION. Initiating rollback..."
rollback_traffic
exit 1
fi
sleep 10
done
log_info "Monitoring complete. $TARGET_VERSION is stable."
}
# Rollback traffic to previous version
rollback_traffic() {
log_error "Rolling back traffic to $CURRENT_VERSION..."
kubectl patch service "$SERVICE_NAME" -n "$NAMESPACE" -p "{\"spec\":{\"selector\":{\"version\":\"$CURRENT_VERSION\"}}}"
log_info "Traffic rolled back to $CURRENT_VERSION"
}
# Main execution
main() {
log_info "Starting blue-green deployment switch..."
detect_current_version
verify_target_ready
run_health_checks
run_smoke_tests
switch_traffic
monitor_target
log_info "Blue-green deployment switch complete!"
log_info "Active version: $TARGET_VERSION"
log_info "Standby version: $CURRENT_VERSION (keep for 24-48h before updating)"
}
main "$@"
This script automates the entire switching process with safety checks at every stage. If any check fails, deployment stops before traffic is affected. For a manual approach to deployment validation, see our ChatGPT App Testing Guide.
Health Check Prober (TypeScript)
Your ChatGPT app must expose health and readiness endpoints. Here's a production implementation:
// health-check.ts - Health and readiness endpoints for MCP server
import express, { Request, Response } from 'express';
import { Client } from 'pg'; // PostgreSQL client
import Redis from 'ioredis';
interface HealthStatus {
status: 'healthy' | 'unhealthy';
timestamp: string;
version: string;
environment: string;
checks: {
database: boolean;
redis: boolean;
openai: boolean;
};
uptime: number;
}
const router = express.Router();
const startTime = Date.now();
// Database connection pool
const dbClient = new Client({
connectionString: process.env.DATABASE_URL,
connectionTimeoutMillis: 5000,
});
// Redis client
const redisClient = new Redis(process.env.REDIS_URL || 'redis://localhost:6379', {
connectTimeout: 5000,
maxRetriesPerRequest: 2,
});
// Health check (liveness probe) - Am I alive?
router.get('/health', async (req: Request, res: Response) => {
const health: HealthStatus = {
status: 'healthy',
timestamp: new Date().toISOString(),
version: process.env.APP_VERSION || 'unknown',
environment: process.env.ENVIRONMENT || 'unknown',
checks: {
database: false,
redis: false,
openai: false,
},
uptime: Date.now() - startTime,
};
try {
// Check database connectivity
await dbClient.query('SELECT 1');
health.checks.database = true;
} catch (error) {
console.error('Database health check failed:', error);
health.status = 'unhealthy';
}
try {
// Check Redis connectivity
await redisClient.ping();
health.checks.redis = true;
} catch (error) {
console.error('Redis health check failed:', error);
health.status = 'unhealthy';
}
try {
// Check OpenAI API availability (lightweight check)
const response = await fetch('https://api.openai.com/v1/models', {
method: 'GET',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
},
signal: AbortSignal.timeout(3000),
});
health.checks.openai = response.ok;
} catch (error) {
console.error('OpenAI health check failed:', error);
// Don't mark unhealthy for OpenAI failures (might be temporary)
}
const statusCode = health.status === 'healthy' ? 200 : 503;
res.status(statusCode).json(health);
});
// Readiness check (readiness probe) - Can I handle traffic?
router.get('/ready', async (req: Request, res: Response) => {
// Stricter checks for readiness
const checks = {
database: false,
redis: false,
openai: false,
dbMigrations: false,
};
try {
// Database connectivity
await dbClient.query('SELECT 1');
checks.database = true;
// Check if migrations are up to date
const migrationCheck = await dbClient.query(
"SELECT EXISTS (SELECT 1 FROM information_schema.tables WHERE table_name = 'schema_migrations')"
);
checks.dbMigrations = migrationCheck.rows[0].exists;
} catch (error) {
console.error('Database readiness check failed:', error);
return res.status(503).json({ ready: false, reason: 'database_unavailable', checks });
}
try {
// Redis connectivity
await redisClient.ping();
checks.redis = true;
} catch (error) {
console.error('Redis readiness check failed:', error);
return res.status(503).json({ ready: false, reason: 'redis_unavailable', checks });
}
try {
// OpenAI API availability
const response = await fetch('https://api.openai.com/v1/models', {
method: 'GET',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
},
signal: AbortSignal.timeout(3000),
});
checks.openai = response.ok;
} catch (error) {
console.error('OpenAI readiness check failed:', error);
return res.status(503).json({ ready: false, reason: 'openai_unavailable', checks });
}
// All checks passed
res.status(200).json({
ready: true,
timestamp: new Date().toISOString(),
version: process.env.APP_VERSION || 'unknown',
checks,
});
});
export default router;
Liveness vs Readiness:
- Liveness: "Is my app running?" (Kubernetes restarts pod if this fails)
- Readiness: "Can I handle traffic?" (Kubernetes removes from load balancer if this fails)
During blue-green switch, Kubernetes uses readiness checks to ensure Green pods are ready before switching traffic. For database migration patterns that work with blue-green deployments, see our ChatGPT App Database Migrations Guide.
Rollback Strategies
Even with thorough testing, production issues can emerge after traffic switches to Green. Your rollback strategy must be instant and reliable.
Automated Rollback (Bash)
#!/bin/bash
# auto-rollback.sh - Automated rollback based on error thresholds
set -euo pipefail
NAMESPACE="chatgpt-production"
SERVICE_NAME="chatgpt-app-service"
MONITOR_DURATION=600 # Monitor for 10 minutes after switch
ERROR_THRESHOLD=10 # Max acceptable errors before rollback
CHECK_INTERVAL=30 # Check every 30 seconds
ACTIVE_VERSION=""
PREVIOUS_VERSION=""
ERROR_COUNT=0
log_info() { echo -e "\033[0;32m[INFO]\033[0m $1"; }
log_error() { echo -e "\033[0;31m[ERROR]\033[0m $1"; }
# Get current active version
get_active_version() {
ACTIVE_VERSION=$(kubectl get service "$SERVICE_NAME" -n "$NAMESPACE" -o jsonpath='{.spec.selector.version}')
if [[ "$ACTIVE_VERSION" == "blue" ]]; then
PREVIOUS_VERSION="green"
else
PREVIOUS_VERSION="blue"
fi
log_info "Monitoring active version: $ACTIVE_VERSION (previous: $PREVIOUS_VERSION)"
}
# Check for pod crashes
check_pod_crashes() {
local crashes=$(kubectl get pods -n "$NAMESPACE" -l "app=chatgpt-app,version=$ACTIVE_VERSION" \
-o jsonpath='{.items[*].status.containerStatuses[0].restartCount}' | \
tr ' ' '\n' | awk '{s+=$1} END {print s}')
if [[ "$crashes" -gt 0 ]]; then
((ERROR_COUNT += crashes))
log_error "Detected $crashes pod crashes in $ACTIVE_VERSION (total errors: $ERROR_COUNT)"
fi
}
# Check error logs
check_error_logs() {
local errors=$(kubectl logs -n "$NAMESPACE" -l "app=chatgpt-app,version=$ACTIVE_VERSION" \
--tail=100 --since="${CHECK_INTERVAL}s" 2>/dev/null | \
grep -c "ERROR\|FATAL\|Exception" || true)
if [[ "$errors" -gt 5 ]]; then
((ERROR_COUNT += errors))
log_error "Detected $errors error log entries in $ACTIVE_VERSION (total errors: $ERROR_COUNT)"
fi
}
# Check response times (requires metrics-server)
check_response_times() {
# This requires Prometheus/metrics-server to be deployed
# Simplified example - adapt to your monitoring stack
local avg_response_time=$(kubectl exec -n "$NAMESPACE" \
$(kubectl get pods -n "$NAMESPACE" -l "app=chatgpt-app,version=$ACTIVE_VERSION" -o jsonpath='{.items[0].metadata.name}') -- \
curl -s http://localhost:3000/metrics | grep "http_request_duration_seconds" | awk '{print $2}' || echo "0")
# If average response time > 5 seconds, increment error count
if (( $(echo "$avg_response_time > 5" | bc -l) )); then
((ERROR_COUNT += 3))
log_error "High response time detected: ${avg_response_time}s (total errors: $ERROR_COUNT)"
fi
}
# Execute rollback
execute_rollback() {
log_error "ERROR THRESHOLD EXCEEDED ($ERROR_COUNT errors). Rolling back to $PREVIOUS_VERSION..."
kubectl patch service "$SERVICE_NAME" -n "$NAMESPACE" \
-p "{\"spec\":{\"selector\":{\"version\":\"$PREVIOUS_VERSION\"}}}"
log_info "Rollback complete. Active version: $PREVIOUS_VERSION"
# Send alert (integrate with PagerDuty, Slack, etc.)
curl -X POST "https://hooks.slack.com/services/YOUR/WEBHOOK/URL" \
-H "Content-Type: application/json" \
-d "{\"text\":\"🚨 Auto-rollback executed: $ACTIVE_VERSION → $PREVIOUS_VERSION (Errors: $ERROR_COUNT)\"}"
exit 1
}
# Main monitoring loop
monitor() {
local elapsed=0
while [[ "$elapsed" -lt "$MONITOR_DURATION" ]]; do
check_pod_crashes
check_error_logs
check_response_times
if [[ "$ERROR_COUNT" -ge "$ERROR_THRESHOLD" ]]; then
execute_rollback
fi
sleep "$CHECK_INTERVAL"
((elapsed += CHECK_INTERVAL))
log_info "Monitoring progress: ${elapsed}s / ${MONITOR_DURATION}s (Errors: $ERROR_COUNT)"
done
log_info "Monitoring complete. $ACTIVE_VERSION is stable (Total errors: $ERROR_COUNT)"
}
# Entry point
main() {
get_active_version
monitor
}
main "$@"
Run this script immediately after switching to Green. It continuously monitors for errors and automatically rolls back if thresholds are exceeded. For production incident management, see our ChatGPT App Incident Response Guide.
Manual Rollback Procedure
If you need to manually rollback (e.g., during off-hours), it's a single command:
# Rollback from green to blue
kubectl patch service chatgpt-app-service -n chatgpt-production \
-p '{"spec":{"selector":{"version":"blue"}}}'
# Verify switch
kubectl get service chatgpt-app-service -n chatgpt-production -o yaml | grep version
Rollback takes effect within seconds—users on existing connections finish their requests, new connections go to Blue immediately.
Database Migrations with Blue-Green
Database migrations are the Achilles' heel of blue-green deployment. You can't simply deploy breaking schema changes because Blue and Green share the same database.
Database Migration Manager (TypeScript)
// migration-manager.ts - Backward-compatible database migrations
import { Pool } from 'pg';
interface Migration {
id: string;
name: string;
up: (db: Pool) => Promise<void>;
down: (db: Pool) => Promise<void>;
}
class MigrationManager {
private db: Pool;
constructor(dbUrl: string) {
this.db = new Pool({ connectionString: dbUrl });
}
// Run migration in backward-compatible way
async runMigration(migration: Migration): Promise<void> {
console.log(`Running migration: ${migration.name}`);
try {
await this.db.query('BEGIN');
// Execute migration
await migration.up(this.db);
// Record migration
await this.db.query(
`INSERT INTO schema_migrations (id, name, applied_at) VALUES ($1, $2, NOW())`,
[migration.id, migration.name]
);
await this.db.query('COMMIT');
console.log(`Migration ${migration.name} completed successfully`);
} catch (error) {
await this.db.query('ROLLBACK');
console.error(`Migration ${migration.name} failed:`, error);
throw error;
}
}
// Example: Add column with default value (backward compatible)
async addColumnWithDefault(): Promise<void> {
const migration: Migration = {
id: '2026-12-25-001',
name: 'add_user_tier_column',
up: async (db) => {
// Step 1: Add column with default value (safe for Blue version)
await db.query(`
ALTER TABLE users
ADD COLUMN IF NOT EXISTS tier VARCHAR(50) DEFAULT 'free'
`);
// Step 2: Backfill existing rows (optional, depends on data volume)
await db.query(`
UPDATE users
SET tier = 'free'
WHERE tier IS NULL
`);
// Step 3: Add index (safe to do before Green deployment)
await db.query(`
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_tier ON users(tier)
`);
},
down: async (db) => {
await db.query(`DROP INDEX IF EXISTS idx_users_tier`);
await db.query(`ALTER TABLE users DROP COLUMN IF EXISTS tier`);
},
};
await this.runMigration(migration);
}
// Example: Rename column (requires expand-contract pattern)
async renameColumn(): Promise<void> {
// PHASE 1: Add new column (run BEFORE deploying Green)
const phase1: Migration = {
id: '2026-12-25-002a',
name: 'add_email_address_column',
up: async (db) => {
// Add new column
await db.query(`
ALTER TABLE users
ADD COLUMN IF NOT EXISTS email_address VARCHAR(255)
`);
// Copy data from old column to new column
await db.query(`
UPDATE users
SET email_address = email
WHERE email_address IS NULL
`);
},
down: async (db) => {
await db.query(`ALTER TABLE users DROP COLUMN IF EXISTS email_address`);
},
};
await this.runMigration(phase1);
// PHASE 2: Update application code to use new column (Green deployment)
// (Green reads/writes email_address, Blue still uses email)
// PHASE 3: Remove old column (run AFTER Blue is updated to match Green)
// DO NOT RUN IMMEDIATELY - wait 24-48 hours after traffic switch
const phase3: Migration = {
id: '2026-12-25-002c',
name: 'remove_email_column',
up: async (db) => {
await db.query(`ALTER TABLE users DROP COLUMN IF EXISTS email`);
},
down: async (db) => {
// Irreversible - old column is gone
throw new Error('Cannot rollback: old email column has been dropped');
},
};
// Uncomment after 24-48 hours and Blue is updated
// await this.runMigration(phase3);
}
async close(): Promise<void> {
await this.db.end();
}
}
// Usage example
async function main() {
const migrator = new MigrationManager(process.env.DATABASE_URL!);
try {
// Run backward-compatible migrations BEFORE deploying Green
await migrator.addColumnWithDefault();
console.log('Migrations completed successfully');
} catch (error) {
console.error('Migration failed:', error);
process.exit(1);
} finally {
await migrator.close();
}
}
if (require.main === module) {
main();
}
Three-Phase Migration Pattern
For breaking changes (like renaming columns), use the expand-contract pattern:
- Expand (before Green deployment): Add new column, backfill data
- Migrate (Green deployment): Application uses new column, writes to both old and new
- Contract (after Blue is updated): Remove old column
This ensures both Blue and Green work with the database at all times. For advanced database strategies, see our ChatGPT App Multi-Tenancy Guide.
Additional Production Tools
Traffic Controller (TypeScript)
For gradual traffic migration (hybrid blue-green + canary), you can implement weighted routing:
// traffic-controller.ts - Gradual traffic shifting
import k8s from '@kubernetes/client-node';
interface TrafficWeight {
blue: number;
green: number;
}
class TrafficController {
private k8sApi: k8s.CoreV1Api;
private namespace: string;
constructor(namespace: string = 'chatgpt-production') {
const kc = new k8s.KubeConfig();
kc.loadFromDefault();
this.k8sApi = kc.makeApiClient(k8s.CoreV1Api);
this.namespace = namespace;
}
// Gradually shift traffic from blue to green
async gradualShift(durationMinutes: number = 30): Promise<void> {
const steps = 10;
const intervalMs = (durationMinutes * 60 * 1000) / steps;
for (let i = 0; i <= steps; i++) {
const greenPercentage = (i / steps) * 100;
const bluePercentage = 100 - greenPercentage;
console.log(`Shifting traffic: Blue ${bluePercentage}% → Green ${greenPercentage}%`);
await this.setTrafficWeights({
blue: bluePercentage,
green: greenPercentage,
});
// Monitor for errors between shifts
await this.sleep(intervalMs);
const healthCheck = await this.checkHealth('green');
if (!healthCheck.healthy) {
console.error('Green environment unhealthy. Aborting shift.');
await this.setTrafficWeights({ blue: 100, green: 0 });
throw new Error('Traffic shift aborted due to health check failure');
}
}
console.log('Gradual traffic shift complete. Green: 100%');
}
private async setTrafficWeights(weights: TrafficWeight): Promise<void> {
// This requires an ingress controller that supports weighted routing
// Example using NGINX Ingress with canary annotations
const ingressName = 'chatgpt-app-ingress';
try {
const { body: ingress } = await this.k8sApi.readNamespacedService(
ingressName,
this.namespace
);
// Update annotations for canary routing
ingress.metadata!.annotations = {
...ingress.metadata!.annotations,
'nginx.ingress.kubernetes.io/canary': 'true',
'nginx.ingress.kubernetes.io/canary-weight': weights.green.toString(),
};
await this.k8sApi.replaceNamespacedService(
ingressName,
this.namespace,
ingress
);
console.log(`Traffic weights updated: Blue ${weights.blue}%, Green ${weights.green}%`);
} catch (error) {
console.error('Failed to update traffic weights:', error);
throw error;
}
}
private async checkHealth(version: string): Promise<{ healthy: boolean }> {
// Port-forward to service and check /health endpoint
// Simplified example
return { healthy: true };
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// Usage
async function main() {
const controller = new TrafficController();
try {
await controller.gradualShift(30); // Shift over 30 minutes
} catch (error) {
console.error('Traffic shift failed:', error);
process.exit(1);
}
}
if (require.main === module) {
main();
}
Smoke Test Suite (TypeScript)
// smoke-tests.ts - Essential post-deployment tests
import { expect } from 'chai';
import fetch from 'node-fetch';
const BASE_URL = process.env.BASE_URL || 'http://localhost:8080';
describe('ChatGPT App Smoke Tests', () => {
it('should return healthy status', async () => {
const response = await fetch(`${BASE_URL}/health`);
const data = await response.json();
expect(response.status).to.equal(200);
expect(data.status).to.equal('healthy');
expect(data.checks.database).to.be.true;
expect(data.checks.redis).to.be.true;
});
it('should serve MCP metadata', async () => {
const response = await fetch(`${BASE_URL}/mcp`);
const data = await response.json();
expect(response.status).to.equal(200);
expect(data.name).to.exist;
expect(data.version).to.exist;
expect(data.tools).to.be.an('array');
});
it('should execute MCP tool successfully', async () => {
const response = await fetch(`${BASE_URL}/mcp`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
jsonrpc: '2.0',
method: 'tools/call',
params: {
name: 'get_user_info',
arguments: { userId: 'test-user-123' },
},
id: 1,
}),
});
const data = await response.json();
expect(response.status).to.equal(200);
expect(data.result).to.exist;
});
it('should render widget template', async () => {
const response = await fetch(`${BASE_URL}/widget`);
const html = await response.text();
expect(response.status).to.equal(200);
expect(html).to.include('window.openai');
expect(html).to.include('text/html+skybridge');
});
it('should handle database queries', async () => {
const response = await fetch(`${BASE_URL}/api/users/test-user-123`);
const data = await response.json();
expect(response.status).to.be.oneOf([200, 404]); // User may not exist in staging
if (response.status === 200) {
expect(data.id).to.equal('test-user-123');
}
});
it('should have acceptable response times', async () => {
const start = Date.now();
await fetch(`${BASE_URL}/health`);
const duration = Date.now() - start;
expect(duration).to.be.lessThan(1000); // < 1 second
});
});
Run smoke tests against the Green environment before switching traffic. If any test fails, abort the deployment. For complete testing strategies, see our ChatGPT App E2E Testing Guide.
CI/CD Pipeline (GitHub Actions YAML)
# .github/workflows/blue-green-deploy.yml
name: Blue-Green Deployment
on:
push:
branches:
- main
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
KUBE_NAMESPACE: chatgpt-production
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix={{branch}}-
type=ref,event=branch
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy-green:
needs: build
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Determine active version
id: detect
run: |
ACTIVE=$(kubectl get service chatgpt-app-service -n ${{ env.KUBE_NAMESPACE }} -o jsonpath='{.spec.selector.version}')
if [ "$ACTIVE" = "blue" ]; then
echo "target=green" >> $GITHUB_OUTPUT
else
echo "target=blue" >> $GITHUB_OUTPUT
fi
- name: Update deployment image
run: |
kubectl set image deployment/chatgpt-app-${{ steps.detect.outputs.target }} \
mcp-server=${{ needs.build.outputs.image-tag }} \
-n ${{ env.KUBE_NAMESPACE }}
- name: Wait for rollout
run: |
kubectl rollout status deployment/chatgpt-app-${{ steps.detect.outputs.target }} \
-n ${{ env.KUBE_NAMESPACE }} \
--timeout=5m
- name: Run smoke tests
run: |
kubectl port-forward -n ${{ env.KUBE_NAMESPACE }} \
service/chatgpt-app-${{ steps.detect.outputs.target }} 8080:80 &
sleep 5
npm run test:smoke -- --base-url http://localhost:8080
switch-traffic:
needs: deploy-green
runs-on: ubuntu-latest
environment:
name: production
url: https://app.yourdomain.com
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Switch traffic
run: |
chmod +x ./scripts/blue-green-switch.sh
./scripts/blue-green-switch.sh
monitor:
needs: switch-traffic
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Monitor deployment
run: |
chmod +x ./scripts/auto-rollback.sh
./scripts/auto-rollback.sh
This GitHub Actions workflow automates the entire blue-green deployment: build image → deploy to Green → run smoke tests → switch traffic → monitor for errors. For more CI/CD patterns, see our ChatGPT App DevOps Guide.
Conclusion: Production-Ready Blue-Green Deployment
Blue-green deployment is the industry standard for zero-downtime releases. By maintaining two identical production environments and switching traffic instantly, you eliminate deployment risk and give yourself an instant rollback option.
The strategies in this guide—Kubernetes blue-green services, automated health checks, smoke testing, backward-compatible database migrations, and automated monitoring—are battle-tested patterns used by teams deploying ChatGPT apps at scale.
Key Takeaways:
- Deploy to Green environment while Blue serves traffic (zero user impact)
- Run comprehensive smoke tests before switching traffic
- Use label-based Kubernetes services for instant traffic switching
- Implement backward-compatible database migrations (expand-contract pattern)
- Monitor Green environment continuously; rollback if errors exceed threshold
- Keep Blue as standby for 24-48 hours after successful switch
If managing this infrastructure feels overwhelming, MakeAIHQ handles blue-green deployment automatically—you focus on building ChatGPT apps, we handle the DevOps complexity. Build, test, and deploy ChatGPT apps with built-in zero-downtime deployment, health monitoring, and instant rollback. Start your free trial today.
For more deployment strategies, explore our guides on canary deployments, feature flags, and disaster recovery.