Zero-Downtime Deployments for ChatGPT Apps: Complete Production Guide
Deploying ChatGPT applications to production requires careful orchestration to maintain 100% uptime during updates. A single deployment failure can disconnect thousands of active conversations, violate SLA agreements, and damage user trust. This comprehensive guide provides production-ready implementations of zero-downtime deployment strategies specifically optimized for ChatGPT apps.
Modern ChatGPT applications face unique deployment challenges: long-lived WebSocket connections for streaming responses, stateful conversation contexts, database schema migrations that must remain backward-compatible with old application versions, and strict latency requirements from OpenAI's runtime. Traditional "stop-deploy-start" approaches are unacceptable for production systems serving real-time AI conversations.
Zero-downtime deployments eliminate service interruptions through sophisticated traffic management, health verification, and progressive rollout strategies. This guide covers three primary deployment patterns—rolling updates, blue-green deployments, and canary releases—with complete code examples for Kubernetes orchestration, comprehensive health checks, graceful connection draining, database migration coordination, and deployment monitoring. Whether you're running a single MCP server or a distributed ChatGPT application platform, these patterns ensure seamless updates without disrupting active users.
By implementing proper readiness probes, liveness checks, and graceful shutdown handlers, your ChatGPT apps can update multiple times per day while maintaining five-nines availability (99.999% uptime). Learn how to coordinate database migrations with application deployments, implement feature flags for phased rollouts, and monitor deployment health in real-time using Prometheus metrics.
Understanding Deployment Strategies
Rolling Updates: Progressive Pod Replacement
Rolling updates gradually replace old application pods with new versions, maintaining minimum replica counts throughout the process. Kubernetes manages the rollout automatically, ensuring new pods pass health checks before terminating old ones.
Advantages for ChatGPT Apps:
- No infrastructure duplication required (cost-effective)
- Automatic rollback on health check failures
- Gradual traffic shift allows early error detection
- Maintains conversation continuity during updates
Best Use Cases:
- MCP server updates with backward-compatible protocol changes
- Minor ChatGPT widget refinements
- Database schema migrations with dual-write strategies
- Incremental feature rollouts behind feature flags
Blue-Green Deployments: Complete Environment Swap
Blue-green deployments maintain two identical production environments ("blue" and "green"). Traffic routes to one environment while the other receives updates. After validation, traffic switches instantly to the updated environment.
Advantages for ChatGPT Apps:
- Instant rollback capability (switch back to previous environment)
- Full validation before production traffic exposure
- Zero-risk database migration testing
- Perfect for major OpenAI Apps SDK version upgrades
Best Use Cases:
- Major MCP protocol version changes
- ChatGPT app architecture refactors
- High-risk database schema migrations
- Compliance-sensitive deployments requiring full validation
Canary Releases: Gradual Traffic Shifting
Canary releases route a small percentage of production traffic to the new version, monitoring error rates and performance metrics before gradually increasing traffic exposure.
Advantages for ChatGPT Apps:
- Minimal blast radius for deployment issues
- Real production traffic validation
- Data-driven rollout decisions based on metrics
- Ideal for A/B testing widget designs
Best Use Cases:
- New AI model integrations
- Experimental ChatGPT widget features
- Performance optimization validation
- Third-party API integration changes
Kubernetes Rolling Updates Implementation
Production Deployment Configuration
This comprehensive Kubernetes deployment configuration implements zero-downtime rolling updates with sophisticated health checks and resource management:
apiVersion: apps/v1
kind: Deployment
metadata:
name: chatgpt-mcp-server
namespace: production
labels:
app: chatgpt-mcp
version: v2.1.0
tier: backend
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # Create 2 extra pods during rollout
maxUnavailable: 1 # Allow max 1 pod unavailable
selector:
matchLabels:
app: chatgpt-mcp
template:
metadata:
labels:
app: chatgpt-mcp
version: v2.1.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
spec:
terminationGracePeriodSeconds: 60 # Wait 60s for graceful shutdown
# Pre-stop hook: drain connections before termination
containers:
- name: mcp-server
image: gcr.io/your-project/chatgpt-mcp:v2.1.0
imagePullPolicy: Always
ports:
- containerPort: 3000
name: http
protocol: TCP
- containerPort: 9090
name: metrics
protocol: TCP
env:
- name: NODE_ENV
value: "production"
- name: MCP_VERSION
value: "2024-11-05"
- name: GRACEFUL_SHUTDOWN_TIMEOUT
value: "55000" # 55s (less than terminationGracePeriodSeconds)
# Resource limits prevent noisy neighbor issues
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
# Readiness probe: verify pod ready for traffic
readinessProbe:
httpGet:
path: /health/ready
port: 3000
scheme: HTTP
initialDelaySeconds: 10 # Wait 10s after start
periodSeconds: 5 # Check every 5s
timeoutSeconds: 3 # Fail if no response in 3s
successThreshold: 2 # Require 2 consecutive successes
failureThreshold: 3 # Remove from service after 3 failures
# Liveness probe: detect crashed/hung pods
livenessProbe:
httpGet:
path: /health/live
port: 3000
scheme: HTTP
initialDelaySeconds: 30 # Grace period for startup
periodSeconds: 10 # Check every 10s
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3 # Restart after 3 consecutive failures
# Startup probe: handle slow initialization
startupProbe:
httpGet:
path: /health/startup
port: 3000
scheme: HTTP
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 30 # Allow 150s total startup time
# Graceful shutdown lifecycle hook
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
# Signal application to stop accepting new connections
kill -SIGTERM 1
# Wait for existing connections to drain
sleep 30
---
apiVersion: v1
kind: Service
metadata:
name: chatgpt-mcp-service
namespace: production
spec:
type: ClusterIP
sessionAffinity: ClientIP # Route same client to same pod
sessionAffinityConfig:
clientIP:
timeoutSeconds: 3600 # 1 hour session stickiness
selector:
app: chatgpt-mcp
ports:
- name: http
port: 80
targetPort: 3000
protocol: TCP
- name: metrics
port: 9090
targetPort: 9090
protocol: TCP
Readiness Probe Implementation
The readiness probe determines when a pod can receive production traffic. This comprehensive implementation validates all critical dependencies:
// src/health/readiness.ts
import express from 'express';
import { Server } from 'http';
import { MCPProtocol } from '../mcp/protocol.js';
import { FirestoreClient } from '../database/firestore.js';
import { RedisClient } from '../cache/redis.js';
interface HealthStatus {
ready: boolean;
checks: {
[key: string]: {
status: 'pass' | 'fail' | 'warn';
time: number;
message?: string;
};
};
timestamp: string;
}
export class ReadinessChecker {
private mcpProtocol: MCPProtocol;
private firestoreClient: FirestoreClient;
private redisClient: RedisClient;
private isShuttingDown = false;
constructor(
mcpProtocol: MCPProtocol,
firestoreClient: FirestoreClient,
redisClient: RedisClient
) {
this.mcpProtocol = mcpProtocol;
this.firestoreClient = firestoreClient;
this.redisClient = redisClient;
}
/**
* Express handler for readiness probe endpoint
*/
async handler(
req: express.Request,
res: express.Response
): Promise<void> {
const status = await this.check();
const httpStatus = status.ready ? 200 : 503;
res.status(httpStatus).json(status);
}
/**
* Comprehensive readiness check
*/
async check(): Promise<HealthStatus> {
const startTime = Date.now();
// If shutting down, immediately return not ready
if (this.isShuttingDown) {
return {
ready: false,
checks: {
shutdown: {
status: 'fail',
time: 0,
message: 'Pod is draining connections for shutdown'
}
},
timestamp: new Date().toISOString()
};
}
const checks = await Promise.all([
this.checkMCPProtocol(),
this.checkFirestore(),
this.checkRedis(),
this.checkMemory(),
this.checkActiveConnections()
]);
const checkResults = Object.fromEntries(checks);
const allPassed = Object.values(checkResults)
.every(c => c.status === 'pass');
return {
ready: allPassed,
checks: checkResults,
timestamp: new Date().toISOString()
};
}
private async checkMCPProtocol(): Promise<[string, any]> {
const start = Date.now();
try {
const isInitialized = this.mcpProtocol.isInitialized();
return ['mcp_protocol', {
status: isInitialized ? 'pass' : 'fail',
time: Date.now() - start,
message: isInitialized ? 'MCP protocol initialized' : 'MCP protocol not ready'
}];
} catch (error) {
return ['mcp_protocol', {
status: 'fail',
time: Date.now() - start,
message: `MCP error: ${error.message}`
}];
}
}
private async checkFirestore(): Promise<[string, any]> {
const start = Date.now();
try {
// Lightweight query to verify Firestore connectivity
await this.firestoreClient.collection('_health')
.limit(1)
.get();
return ['firestore', {
status: 'pass',
time: Date.now() - start
}];
} catch (error) {
return ['firestore', {
status: 'fail',
time: Date.now() - start,
message: `Firestore unavailable: ${error.message}`
}];
}
}
private async checkRedis(): Promise<[string, any]> {
const start = Date.now();
try {
await this.redisClient.ping();
return ['redis', {
status: 'pass',
time: Date.now() - start
}];
} catch (error) {
return ['redis', {
status: 'warn', // Redis failures degraded, not fatal
time: Date.now() - start,
message: `Redis degraded: ${error.message}`
}];
}
}
private async checkMemory(): Promise<[string, any]> {
const start = Date.now();
const memUsage = process.memoryUsage();
const heapUsedPercent = (memUsage.heapUsed / memUsage.heapTotal) * 100;
return ['memory', {
status: heapUsedPercent < 90 ? 'pass' : 'fail',
time: Date.now() - start,
message: `Heap usage: ${heapUsedPercent.toFixed(1)}%`
}];
}
private async checkActiveConnections(): Promise<[string, any]> {
const start = Date.now();
const activeCount = this.mcpProtocol.getActiveConnectionCount();
const maxConnections = 1000;
return ['active_connections', {
status: activeCount < maxConnections ? 'pass' : 'fail',
time: Date.now() - start,
message: `${activeCount}/${maxConnections} connections`
}];
}
/**
* Mark pod as shutting down (removes from load balancer)
*/
markShuttingDown(): void {
this.isShuttingDown = true;
}
}
Liveness Probe Implementation
The liveness probe detects crashed or deadlocked pods. Unlike readiness, liveness failures trigger pod restarts:
// src/health/liveness.ts
import express from 'express';
import { EventEmitter } from 'events';
interface LivenessStatus {
alive: boolean;
uptime: number;
checks: {
[key: string]: {
status: 'pass' | 'fail';
message?: string;
};
};
timestamp: string;
}
export class LivenessChecker extends EventEmitter {
private startTime: number;
private lastSuccessfulRequest: number;
private requestTimeoutMs = 30000; // 30 seconds
constructor() {
super();
this.startTime = Date.now();
this.lastSuccessfulRequest = Date.now();
}
/**
* Express handler for liveness probe endpoint
*/
async handler(
req: express.Request,
res: express.Response
): Promise<void> {
const status = await this.check();
const httpStatus = status.alive ? 200 : 503;
res.status(httpStatus).json(status);
}
/**
* Liveness check: detect crashed/deadlocked state
*/
async check(): Promise<LivenessStatus> {
const uptime = Date.now() - this.startTime;
const checks = {
process: this.checkProcess(),
event_loop: this.checkEventLoop(),
requests: this.checkRecentRequests()
};
const allPassed = Object.values(checks)
.every(c => c.status === 'pass');
return {
alive: allPassed,
uptime,
checks,
timestamp: new Date().toISOString()
};
}
private checkProcess(): { status: 'pass' | 'fail'; message?: string } {
try {
// Verify process still responsive
const memUsage = process.memoryUsage();
return {
status: 'pass',
message: `RSS: ${(memUsage.rss / 1024 / 1024).toFixed(0)}MB`
};
} catch (error) {
return {
status: 'fail',
message: `Process check failed: ${error.message}`
};
}
}
private checkEventLoop(): { status: 'pass' | 'fail'; message?: string } {
const start = Date.now();
// Synchronous operation should complete instantly
// If event loop blocked, this will take time
let sum = 0;
for (let i = 0; i < 1000; i++) {
sum += i;
}
const elapsed = Date.now() - start;
return {
status: elapsed < 100 ? 'pass' : 'fail',
message: `Event loop lag: ${elapsed}ms`
};
}
private checkRecentRequests(): { status: 'pass' | 'fail'; message?: string } {
const timeSinceLastRequest = Date.now() - this.lastSuccessfulRequest;
return {
status: timeSinceLastRequest < this.requestTimeoutMs ? 'pass' : 'fail',
message: `Last request: ${(timeSinceLastRequest / 1000).toFixed(0)}s ago`
};
}
/**
* Call this on every successful request
*/
recordSuccessfulRequest(): void {
this.lastSuccessfulRequest = Date.now();
}
}
Comprehensive Health Check Endpoint
This production-grade health check endpoint provides detailed diagnostics for monitoring systems and load balancers:
// src/routes/health.ts
import express from 'express';
import { ReadinessChecker } from '../health/readiness.js';
import { LivenessChecker } from '../health/liveness.js';
import { DependencyHealthChecker } from '../health/dependencies.js';
export function createHealthRouter(
readinessChecker: ReadinessChecker,
livenessChecker: LivenessChecker,
dependencyChecker: DependencyHealthChecker
): express.Router {
const router = express.Router();
/**
* Kubernetes readiness probe
* Returns 200 when pod ready for traffic
*/
router.get('/health/ready', async (req, res) => {
await readinessChecker.handler(req, res);
});
/**
* Kubernetes liveness probe
* Returns 200 when pod alive (not deadlocked)
*/
router.get('/health/live', async (req, res) => {
await livenessChecker.handler(req, res);
});
/**
* Kubernetes startup probe
* Returns 200 when initialization complete
*/
router.get('/health/startup', async (req, res) => {
// Simple check: if we can respond, startup succeeded
res.status(200).json({
status: 'started',
timestamp: new Date().toISOString()
});
});
/**
* Comprehensive health check with all dependencies
* Used by monitoring systems (not Kubernetes probes)
*/
router.get('/health', async (req, res) => {
const [readiness, liveness, dependencies] = await Promise.all([
readinessChecker.check(),
livenessChecker.check(),
dependencyChecker.checkAll()
]);
const overallHealthy = readiness.ready && liveness.alive;
res.status(overallHealthy ? 200 : 503).json({
status: overallHealthy ? 'healthy' : 'unhealthy',
readiness,
liveness,
dependencies,
version: process.env.APP_VERSION || 'unknown',
timestamp: new Date().toISOString()
});
});
return router;
}
Dependency Health Checker
This module validates external service connectivity with circuit breaker patterns:
// src/health/dependencies.ts
import { FirestoreClient } from '../database/firestore.js';
import { RedisClient } from '../cache/redis.js';
import axios from 'axios';
interface DependencyStatus {
[key: string]: {
status: 'healthy' | 'degraded' | 'unavailable';
latency: number;
message?: string;
lastChecked: string;
};
}
export class DependencyHealthChecker {
private firestoreClient: FirestoreClient;
private redisClient: RedisClient;
private openaiApiKey: string;
constructor(
firestoreClient: FirestoreClient,
redisClient: RedisClient,
openaiApiKey: string
) {
this.firestoreClient = firestoreClient;
this.redisClient = redisClient;
this.openaiApiKey = openaiApiKey;
}
async checkAll(): Promise<DependencyStatus> {
const checks = await Promise.allSettled([
this.checkFirestore(),
this.checkRedis(),
this.checkOpenAI()
]);
return {
firestore: checks[0].status === 'fulfilled'
? checks[0].value
: this.createFailedCheck('Firestore check threw exception'),
redis: checks[1].status === 'fulfilled'
? checks[1].value
: this.createFailedCheck('Redis check threw exception'),
openai: checks[2].status === 'fulfilled'
? checks[2].value
: this.createFailedCheck('OpenAI check threw exception')
};
}
private async checkFirestore() {
const start = Date.now();
try {
await this.firestoreClient
.collection('_health')
.limit(1)
.get();
const latency = Date.now() - start;
return {
status: latency < 500 ? 'healthy' : 'degraded',
latency,
lastChecked: new Date().toISOString()
};
} catch (error) {
return {
status: 'unavailable',
latency: Date.now() - start,
message: error.message,
lastChecked: new Date().toISOString()
};
}
}
private async checkRedis() {
const start = Date.now();
try {
await this.redisClient.ping();
const latency = Date.now() - start;
return {
status: latency < 100 ? 'healthy' : 'degraded',
latency,
lastChecked: new Date().toISOString()
};
} catch (error) {
return {
status: 'unavailable',
latency: Date.now() - start,
message: error.message,
lastChecked: new Date().toISOString()
};
}
}
private async checkOpenAI() {
const start = Date.now();
try {
// Lightweight API call to verify OpenAI connectivity
const response = await axios.get('https://api.openai.com/v1/models', {
headers: {
'Authorization': `Bearer ${this.openaiApiKey}`
},
timeout: 5000
});
const latency = Date.now() - start;
return {
status: response.status === 200 ? 'healthy' : 'degraded',
latency,
lastChecked: new Date().toISOString()
};
} catch (error) {
return {
status: 'unavailable',
latency: Date.now() - start,
message: error.message,
lastChecked: new Date().toISOString()
};
}
}
private createFailedCheck(message: string) {
return {
status: 'unavailable' as const,
latency: 0,
message,
lastChecked: new Date().toISOString()
};
}
}
Graceful Shutdown Handler
This implementation drains connections cleanly before pod termination:
// src/graceful-shutdown.ts
import { Server } from 'http';
import { MCPProtocol } from './mcp/protocol.js';
import { ReadinessChecker } from './health/readiness.js';
export class GracefulShutdownHandler {
private server: Server;
private mcpProtocol: MCPProtocol;
private readinessChecker: ReadinessChecker;
private shutdownTimeout: number;
private isShuttingDown = false;
constructor(
server: Server,
mcpProtocol: MCPProtocol,
readinessChecker: ReadinessChecker,
shutdownTimeout = 55000 // 55 seconds
) {
this.server = server;
this.mcpProtocol = mcpProtocol;
this.readinessChecker = readinessChecker;
this.shutdownTimeout = shutdownTimeout;
this.registerSignalHandlers();
}
private registerSignalHandlers(): void {
// Kubernetes sends SIGTERM for graceful shutdown
process.on('SIGTERM', () => this.shutdown('SIGTERM'));
process.on('SIGINT', () => this.shutdown('SIGINT'));
}
private async shutdown(signal: string): Promise<void> {
if (this.isShuttingDown) {
console.log('Shutdown already in progress, ignoring signal');
return;
}
this.isShuttingDown = true;
console.log(`Received ${signal}, starting graceful shutdown...`);
// Step 1: Mark readiness probe as failed (removes from load balancer)
this.readinessChecker.markShuttingDown();
console.log('✓ Removed from load balancer (readiness = false)');
// Step 2: Wait for load balancer to propagate (typically 5-10s)
await this.sleep(10000);
console.log('✓ Load balancer propagation complete');
// Step 3: Stop accepting new connections
this.server.close(() => {
console.log('✓ HTTP server closed (no new connections)');
});
// Step 4: Drain existing MCP connections
const drainStart = Date.now();
const drainTimeout = this.shutdownTimeout - 15000; // Reserve 15s for final cleanup
console.log(`Draining ${this.mcpProtocol.getActiveConnectionCount()} active connections...`);
const drainPromise = this.mcpProtocol.drainConnections(drainTimeout);
const timeoutPromise = this.sleep(drainTimeout).then(() => {
throw new Error('Drain timeout exceeded');
});
try {
await Promise.race([drainPromise, timeoutPromise]);
console.log(`✓ All connections drained (${Date.now() - drainStart}ms)`);
} catch (error) {
const remaining = this.mcpProtocol.getActiveConnectionCount();
console.warn(`⚠ Drain timeout: ${remaining} connections remain, force closing`);
await this.mcpProtocol.forceCloseConnections();
}
// Step 5: Final cleanup
console.log('Performing final cleanup...');
await this.cleanup();
console.log('✓ Graceful shutdown complete');
process.exit(0);
}
private async cleanup(): Promise<void> {
// Close database connections, flush logs, etc.
await Promise.all([
this.mcpProtocol.close(),
// Add other cleanup tasks here
]);
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
Database Migration Coordination
Zero-downtime deployments require careful database migration coordination to maintain backward compatibility during rolling updates.
Migration Lock Manager
This distributed lock prevents simultaneous migrations from multiple pods:
// src/database/migration-lock.ts
import { FirestoreClient } from './firestore.js';
interface MigrationLock {
migrationId: string;
podName: string;
acquiredAt: Date;
expiresAt: Date;
}
export class MigrationLockManager {
private firestore: FirestoreClient;
private lockCollection = '_migration_locks';
private lockTimeout = 300000; // 5 minutes
constructor(firestore: FirestoreClient) {
this.firestore = firestore;
}
/**
* Acquire distributed lock for migration
* Returns true if lock acquired, false if another pod holds lock
*/
async acquireLock(migrationId: string): Promise<boolean> {
const podName = process.env.HOSTNAME || 'unknown-pod';
const lockDoc = this.firestore
.collection(this.lockCollection)
.doc(migrationId);
try {
// Atomic transaction to prevent race conditions
await this.firestore.runTransaction(async (transaction) => {
const doc = await transaction.get(lockDoc);
if (doc.exists) {
const lock = doc.data() as MigrationLock;
const now = new Date();
// Check if existing lock expired
if (new Date(lock.expiresAt) > now) {
throw new Error(`Migration locked by ${lock.podName}`);
}
// Lock expired, we can take it
console.log(`Taking over expired lock from ${lock.podName}`);
}
// Acquire/renew lock
const expiresAt = new Date(Date.now() + this.lockTimeout);
transaction.set(lockDoc, {
migrationId,
podName,
acquiredAt: new Date(),
expiresAt
});
});
console.log(`✓ Acquired migration lock: ${migrationId}`);
return true;
} catch (error) {
console.log(`Failed to acquire migration lock: ${error.message}`);
return false;
}
}
/**
* Release migration lock
*/
async releaseLock(migrationId: string): Promise<void> {
const lockDoc = this.firestore
.collection(this.lockCollection)
.doc(migrationId);
await lockDoc.delete();
console.log(`✓ Released migration lock: ${migrationId}`);
}
/**
* Wait for migration lock with exponential backoff
*/
async waitForLock(
migrationId: string,
maxWaitTime = 300000 // 5 minutes
): Promise<boolean> {
const startTime = Date.now();
let backoff = 1000; // Start with 1 second
while (Date.now() - startTime < maxWaitTime) {
const acquired = await this.acquireLock(migrationId);
if (acquired) return true;
// Exponential backoff (max 30s)
await this.sleep(Math.min(backoff, 30000));
backoff *= 2;
}
return false;
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
Backward-Compatible Schema Migration
This SQL migration adds new columns while maintaining compatibility with old application versions:
-- Migration: add_conversation_metadata.sql
-- Version: 2026-01-15-001
-- Strategy: Expand-Contract pattern for zero-downtime
-- Phase 1: EXPAND (safe with old app version)
-- Add new nullable columns (old app ignores them)
ALTER TABLE conversations
ADD COLUMN metadata_json TEXT DEFAULT '{}',
ADD COLUMN model_version VARCHAR(50) DEFAULT 'gpt-4o',
ADD COLUMN token_count INTEGER DEFAULT 0,
ADD COLUMN created_at_v2 TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
ADD INDEX idx_model_version (model_version),
ADD INDEX idx_created_at_v2 (created_at_v2);
-- Phase 2: BACKFILL (run after old pods terminated)
-- Populate new columns from existing data
-- Run this as a separate job, NOT during deployment
-- UPDATE conversations
-- SET
-- metadata_json = COALESCE(old_metadata_column, '{}'),
-- token_count = COALESCE(old_token_column, 0),
-- created_at_v2 = COALESCE(created_at, CURRENT_TIMESTAMP)
-- WHERE metadata_json = '{}';
-- Phase 3: CONTRACT (after new app version deployed)
-- Remove old columns in future migration
-- ONLY after 100% of pods running new version
-- ALTER TABLE conversations
-- DROP COLUMN old_metadata_column,
-- DROP COLUMN old_token_column;
-- Verification query (run after migration)
SELECT
COUNT(*) as total_conversations,
COUNT(metadata_json) as with_metadata,
AVG(token_count) as avg_tokens
FROM conversations;
Feature Flag Implementation
Feature flags decouple deployments from feature releases, enabling gradual rollouts:
// src/feature-flags/manager.ts
import { FirestoreClient } from '../database/firestore.js';
interface FeatureFlag {
name: string;
enabled: boolean;
rolloutPercentage: number; // 0-100
enabledForUsers?: string[]; // Specific user IDs
disabledForUsers?: string[];
metadata?: Record<string, any>;
}
export class FeatureFlagManager {
private firestore: FirestoreClient;
private cache = new Map<string, FeatureFlag>();
private cacheTTL = 60000; // 1 minute
private lastCacheUpdate = 0;
constructor(firestore: FirestoreClient) {
this.firestore = firestore;
}
/**
* Check if feature enabled for specific user
*/
async isEnabled(featureName: string, userId: string): Promise<boolean> {
const flag = await this.getFlag(featureName);
if (!flag) return false;
// Global disable
if (!flag.enabled) return false;
// Explicit user disable
if (flag.disabledForUsers?.includes(userId)) return false;
// Explicit user enable
if (flag.enabledForUsers?.includes(userId)) return true;
// Percentage-based rollout (deterministic hash)
const userHash = this.hashUserId(userId);
return userHash < flag.rolloutPercentage;
}
private async getFlag(featureName: string): Promise<FeatureFlag | null> {
// Check cache
if (Date.now() - this.lastCacheUpdate < this.cacheTTL) {
return this.cache.get(featureName) || null;
}
// Refresh cache
await this.refreshCache();
return this.cache.get(featureName) || null;
}
private async refreshCache(): Promise<void> {
const snapshot = await this.firestore
.collection('feature_flags')
.get();
this.cache.clear();
snapshot.forEach(doc => {
const flag = doc.data() as FeatureFlag;
this.cache.set(flag.name, flag);
});
this.lastCacheUpdate = Date.now();
}
/**
* Hash user ID to percentage (0-100)
* Ensures consistent assignment across requests
*/
private hashUserId(userId: string): number {
let hash = 0;
for (let i = 0; i < userId.length; i++) {
hash = ((hash << 5) - hash) + userId.charCodeAt(i);
hash = hash & hash; // Convert to 32-bit integer
}
return Math.abs(hash % 100);
}
}
// Usage example
const featureFlags = new FeatureFlagManager(firestoreClient);
// In your MCP server endpoint
app.post('/mcp/tools/call', async (req, res) => {
const { userId, toolName } = req.body;
// Check if new tool version enabled for this user
const useNewVersion = await featureFlags.isEnabled(
'new_tool_implementation',
userId
);
if (useNewVersion) {
return handleToolCallV2(req, res);
} else {
return handleToolCallV1(req, res);
}
});
Monitoring Deployments with Prometheus
Real-time deployment monitoring detects issues before they impact users.
Deployment Metrics Instrumentation
// src/metrics/deployment.ts
import promClient from 'prom-client';
export class DeploymentMetrics {
private static deploymentInfo = new promClient.Gauge({
name: 'app_deployment_info',
help: 'Deployment metadata (version, timestamp)',
labelNames: ['version', 'pod_name', 'deployment_time']
});
private static requestDuration = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'route', 'status_code', 'app_version'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10]
});
private static activeConnections = new promClient.Gauge({
name: 'mcp_active_connections',
help: 'Number of active MCP connections',
labelNames: ['app_version']
});
private static errorRate = new promClient.Counter({
name: 'http_errors_total',
help: 'Total HTTP errors',
labelNames: ['method', 'route', 'status_code', 'app_version']
});
static recordDeployment(version: string): void {
const podName = process.env.HOSTNAME || 'unknown';
const deploymentTime = new Date().toISOString();
this.deploymentInfo.set(
{ version, pod_name: podName, deployment_time: deploymentTime },
1
);
}
static recordRequest(
method: string,
route: string,
statusCode: number,
durationSeconds: number
): void {
const version = process.env.APP_VERSION || 'unknown';
this.requestDuration.observe(
{ method, route, status_code: statusCode.toString(), app_version: version },
durationSeconds
);
if (statusCode >= 500) {
this.errorRate.inc({
method,
route,
status_code: statusCode.toString(),
app_version: version
});
}
}
static recordActiveConnections(count: number): void {
const version = process.env.APP_VERSION || 'unknown';
this.activeConnections.set({ app_version: version }, count);
}
static getRegistry(): promClient.Registry {
return promClient.register;
}
}
Learn more about deployment best practices in our ChatGPT App DevOps Guide and explore Blue-Green Deployment Strategies. For database migration patterns, see Database Schema Migrations for ChatGPT Apps.
Production Deployment Checklist
Before deploying ChatGPT apps to production, verify these critical requirements:
Pre-Deployment:
- All health check endpoints return 200 OK
- Database migrations tested in staging environment
- Feature flags configured for gradual rollout
- Prometheus metrics instrumentation validated
- Load testing completed (minimum 2x expected traffic)
- Rollback plan documented and tested
During Deployment:
- Monitor Kubernetes pod rollout status (
kubectl rollout status) - Watch Prometheus dashboards for error rate spikes
- Verify new pods pass readiness probes before old pods terminate
- Check application logs for startup errors
- Validate active connection counts remain stable
Post-Deployment:
- Smoke test critical MCP tools on production
- Verify database migration completed successfully
- Confirm zero increase in error rates
- Check average response latency unchanged
- Run full regression test suite
- Monitor for 24 hours before declaring deployment successful
Explore our Enterprise ChatGPT App Deployment solutions for managed deployment pipelines with automated rollback and compliance validation. For comprehensive monitoring setup, see Production Monitoring for ChatGPT Apps.
Conclusion: Achieving Five-Nines Reliability
Zero-downtime deployments transform ChatGPT applications from fragile prototypes to enterprise-grade production systems. By implementing rolling updates with comprehensive health checks, graceful connection draining, and backward-compatible database migrations, your MCP servers can update continuously while maintaining 99.999% uptime.
The key to successful zero-downtime deployments lies in three principles: verify before promoting (readiness probes validate pods before traffic exposure), fail fast and rollback (automated health checks detect issues within seconds), and coordinate across layers (application code, database schema, and infrastructure must align during transitions).
Start with rolling updates for low-risk deployments, graduate to blue-green deployments for high-stakes releases, and implement canary releases for data-driven rollout decisions. With proper monitoring, feature flags, and graceful shutdown handlers, you'll deploy ChatGPT apps with confidence—multiple times per day, without disrupting a single active conversation.
Ready to deploy ChatGPT apps with enterprise-grade reliability? Start building with MakeAIHQ and get production-ready Kubernetes configurations, health check templates, and deployment automation included. From your first MCP server to scaling to millions of ChatGPT users, our platform ensures zero-downtime deployments at every stage.