MCP Server Deployment: Docker, Kubernetes & Cloud Run Production Setup
Deploying Model Context Protocol (MCP) servers to production requires careful consideration of containerization, orchestration, scaling, and reliability. While local development with npx @modelcontextprotocol/inspector works great for testing, production workloads demand robust infrastructure, health monitoring, graceful shutdowns, and zero-downtime updates.
In this comprehensive guide, we'll walk through production-grade deployment patterns for MCP servers using Docker, Kubernetes, and Google Cloud Run. Whether you're deploying a simple fitness studio booking assistant or a complex multi-tenant ChatGPT app, these patterns ensure your MCP server can handle real-world traffic, scale automatically, and recover from failures gracefully.
We'll cover multi-stage Docker builds that reduce image sizes by 70%, Kubernetes deployments with horizontal pod autoscaling, Cloud Run configurations that scale to zero when idle, health check implementations that prevent traffic to unhealthy containers, and blue-green deployment strategies that eliminate downtime during updates.
By the end of this guide, you'll have production-ready deployment configurations you can adapt for your own MCP servers, complete with security best practices, monitoring hooks, and automated rollback mechanisms. Let's get started.
Docker Containerization: Building Production-Ready Images
Docker containerization is the foundation of modern MCP deployment. A well-designed Dockerfile creates lightweight, secure, and reproducible container images that run consistently across development, staging, and production environments.
Multi-Stage Builds for Minimal Image Size
Multi-stage builds separate the build environment from the runtime environment, dramatically reducing final image size. This approach installs build tools (TypeScript compiler, webpack, etc.) in a temporary build stage, then copies only the compiled artifacts to the final runtime stage.
Here's a production-ready Dockerfile for a TypeScript-based MCP server:
# ========================================
# STAGE 1: Build Stage
# ========================================
FROM node:20-alpine AS builder
# Install build dependencies
RUN apk add --no-cache python3 make g++ git
# Set working directory
WORKDIR /build
# Copy package files
COPY package*.json ./
COPY tsconfig.json ./
# Install ALL dependencies (including devDependencies)
RUN npm ci
# Copy source code
COPY src/ ./src/
# Build TypeScript to JavaScript
RUN npm run build
# Prune dev dependencies
RUN npm prune --production
# ========================================
# STAGE 2: Runtime Stage
# ========================================
FROM node:20-alpine
# Install dumb-init for proper signal handling
RUN apk add --no-cache dumb-init
# Create non-root user
RUN addgroup -g 1001 mcpserver && \
adduser -D -u 1001 -G mcpserver mcpserver
# Set working directory
WORKDIR /app
# Copy built artifacts from builder stage
COPY --from=builder --chown=mcpserver:mcpserver /build/dist ./dist
COPY --from=builder --chown=mcpserver:mcpserver /build/node_modules ./node_modules
COPY --from=builder --chown=mcpserver:mcpserver /build/package*.json ./
# Switch to non-root user
USER mcpserver
# Expose MCP server port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD node -e "require('http').get('http://localhost:3000/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1); });"
# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]
# Start MCP server
CMD ["node", "dist/index.js"]
Key optimizations:
- Multi-stage build: Reduces final image from 450MB to 120MB (73% reduction)
- Alpine Linux: Minimal base image (5MB vs 200MB for Debian)
- Non-root user: Security best practice (prevents privilege escalation)
- dumb-init: Proper signal handling for graceful shutdowns
- Built-in health check: Docker automatically monitors container health
Security Scanning and Hardening
Production images should be scanned for vulnerabilities before deployment. Integrate Trivy or Snyk into your CI/CD pipeline:
# Build image
docker build -t mcp-server:latest .
# Scan for vulnerabilities
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy:latest image --severity HIGH,CRITICAL mcp-server:latest
# If scan passes, tag for production
docker tag mcp-server:latest gcr.io/your-project/mcp-server:v1.2.3
docker push gcr.io/your-project/mcp-server:v1.2.3
Additional hardening measures:
- Read-only filesystem: Add
--read-onlyflag to prevent runtime modifications - No new privileges: Add
--security-opt=no-new-privileges:true - Drop capabilities: Add
--cap-drop=ALLto remove unnecessary Linux capabilities
For more Docker best practices, see the official Docker security documentation.
Kubernetes Deployment: Orchestrating MCP Servers at Scale
Kubernetes provides robust orchestration for MCP servers, enabling horizontal scaling, self-healing, and zero-downtime updates. A production Kubernetes deployment includes Deployment, Service, Ingress, HorizontalPodAutoscaler, and ConfigMap resources.
Complete Kubernetes Deployment Manifest
# ========================================
# ConfigMap: Environment Configuration
# ========================================
apiVersion: v1
kind: ConfigMap
metadata:
name: mcp-server-config
namespace: production
data:
NODE_ENV: "production"
LOG_LEVEL: "info"
MCP_PORT: "3000"
REDIS_HOST: "redis-master.production.svc.cluster.local"
REDIS_PORT: "6379"
---
# ========================================
# Secret: Sensitive Credentials
# ========================================
apiVersion: v1
kind: Secret
metadata:
name: mcp-server-secrets
namespace: production
type: Opaque
data:
# Base64 encoded values (use: echo -n "value" | base64)
OPENAI_API_KEY: "c2stcHJvai14eHh4eHh4eHh4eHh4eHh4"
DATABASE_URL: "cG9zdGdyZXNxbDovL3VzZXI6cGFzc0BkYi5leGFtcGxlLmNvbS9kYg=="
---
# ========================================
# Deployment: MCP Server Pods
# ========================================
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-server
namespace: production
labels:
app: mcp-server
version: v1.2.3
spec:
replicas: 3
revisionHistoryLimit: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: mcp-server
template:
metadata:
labels:
app: mcp-server
version: v1.2.3
spec:
serviceAccountName: mcp-server
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
containers:
- name: mcp-server
image: gcr.io/your-project/mcp-server:v1.2.3
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 3000
protocol: TCP
env:
- name: NODE_ENV
valueFrom:
configMapKeyRef:
name: mcp-server-config
key: NODE_ENV
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: mcp-server-config
key: LOG_LEVEL
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: mcp-server-secrets
key: OPENAI_API_KEY
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
---
# ========================================
# Service: Internal Load Balancer
# ========================================
apiVersion: v1
kind: Service
metadata:
name: mcp-server
namespace: production
labels:
app: mcp-server
spec:
type: ClusterIP
selector:
app: mcp-server
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
---
# ========================================
# HorizontalPodAutoscaler: Auto-Scaling
# ========================================
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mcp-server-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mcp-server
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 120
---
# ========================================
# Ingress: External HTTPS Access
# ========================================
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: mcp-server-ingress
namespace: production
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.yourapp.com
secretName: mcp-server-tls
rules:
- host: api.yourapp.com
http:
paths:
- path: /mcp
pathType: Prefix
backend:
service:
name: mcp-server
port:
name: http
Key features:
- Rolling updates: Zero-downtime deployments with
maxUnavailable: 0 - Auto-scaling: Automatically scales from 3 to 20 pods based on CPU/memory
- Health checks: Separate liveness (container alive) and readiness (ready for traffic) probes
- Resource limits: Prevents resource starvation and ensures fair scheduling
- Security: Non-root user, read-only filesystem, secret management
For complete Kubernetes deployment guides, see the official Kubernetes documentation.
Cloud Run Deployment: Serverless Simplicity
Google Cloud Run offers a fully managed serverless platform for MCP servers. It automatically scales to zero when idle (saving costs) and scales up to handle traffic spikes, without managing infrastructure.
Cloud Run Configuration
# cloud-run-service.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: mcp-server
namespace: production
annotations:
run.googleapis.com/ingress: "all"
run.googleapis.com/launch-stage: "GA"
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/maxScale: "100"
run.googleapis.com/cpu-throttling: "false"
run.googleapis.com/startup-cpu-boost: "true"
spec:
containerConcurrency: 80
timeoutSeconds: 300
serviceAccountName: mcp-server@your-project.iam.gserviceaccount.com
containers:
- name: mcp-server
image: gcr.io/your-project/mcp-server:v1.2.3
ports:
- name: http1
containerPort: 3000
env:
- name: NODE_ENV
value: "production"
- name: LOG_LEVEL
value: "info"
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-credentials
key: api-key
resources:
limits:
memory: "512Mi"
cpu: "1000m"
startupProbe:
httpGet:
path: /health/startup
port: 3000
initialDelaySeconds: 0
periodSeconds: 1
failureThreshold: 30
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
Deployment Script
#!/bin/bash
# deploy-cloud-run.sh
set -euo pipefail
PROJECT_ID="your-project"
REGION="us-central1"
SERVICE_NAME="mcp-server"
IMAGE="gcr.io/${PROJECT_ID}/${SERVICE_NAME}:latest"
echo "๐จ Building Docker image..."
docker build -t "${IMAGE}" .
echo "๐ค Pushing to Google Container Registry..."
docker push "${IMAGE}"
echo "๐ Deploying to Cloud Run..."
gcloud run deploy "${SERVICE_NAME}" \
--image="${IMAGE}" \
--platform=managed \
--region="${REGION}" \
--project="${PROJECT_ID}" \
--allow-unauthenticated \
--min-instances=1 \
--max-instances=100 \
--memory=512Mi \
--cpu=1 \
--timeout=300s \
--concurrency=80 \
--set-env-vars="NODE_ENV=production,LOG_LEVEL=info" \
--set-secrets="OPENAI_API_KEY=openai-credentials:latest" \
--service-account="mcp-server@${PROJECT_ID}.iam.gserviceaccount.com"
echo "โ
Deployment complete!"
# Get the service URL
SERVICE_URL=$(gcloud run services describe "${SERVICE_NAME}" \
--platform=managed \
--region="${REGION}" \
--project="${PROJECT_ID}" \
--format="value(status.url)")
echo "๐ Service URL: ${SERVICE_URL}"
# Test health endpoint
echo "๐ฉบ Testing health endpoint..."
curl -f "${SERVICE_URL}/health" || echo "โ Health check failed"
Cloud Run advantages:
- Auto-scaling to zero: No idle costs when not in use
- Fully managed: No infrastructure to maintain
- Pay-per-use: Charged only for CPU time during requests
- Built-in HTTPS: Automatic SSL certificate provisioning
For more Cloud Run best practices, see the Cloud Run documentation.
Health Checks: Ensuring Container Reliability
Health checks are critical for production deployments. They enable Kubernetes/Cloud Run to automatically restart unhealthy containers and route traffic only to healthy instances.
Health Check Endpoints Implementation
// src/health.ts
import express, { Request, Response } from 'express';
import { createClient } from 'redis';
export interface HealthStatus {
status: 'healthy' | 'unhealthy' | 'starting';
timestamp: string;
uptime: number;
checks: {
redis?: boolean;
database?: boolean;
openai?: boolean;
};
}
export class HealthChecker {
private app: express.Application;
private startTime: number;
private isReady: boolean = false;
private redisClient: any;
constructor(app: express.Application, redisClient: any) {
this.app = app;
this.redisClient = redisClient;
this.startTime = Date.now();
this.registerRoutes();
}
private registerRoutes(): void {
// Startup probe: Is the application still starting?
this.app.get('/health/startup', this.startupProbe.bind(this));
// Liveness probe: Is the application alive?
this.app.get('/health/live', this.livenessProbe.bind(this));
// Readiness probe: Is the application ready to serve traffic?
this.app.get('/health/ready', this.readinessProbe.bind(this));
// Combined health endpoint
this.app.get('/health', this.healthCheck.bind(this));
}
// Startup probe: Used during initialization
private async startupProbe(req: Request, res: Response): Promise<void> {
const uptime = Date.now() - this.startTime;
// Allow 30 seconds for startup
if (uptime < 30000 && !this.isReady) {
res.status(503).json({
status: 'starting',
uptime,
message: 'Application is still starting'
});
return;
}
this.isReady = true;
res.status(200).json({
status: 'healthy',
uptime,
message: 'Application started successfully'
});
}
// Liveness probe: Is the container alive?
private async livenessProbe(req: Request, res: Response): Promise<void> {
const uptime = Date.now() - this.startTime;
// Basic liveness check (process is running)
res.status(200).json({
status: 'healthy',
uptime,
timestamp: new Date().toISOString()
});
}
// Readiness probe: Can the container serve traffic?
private async readinessProbe(req: Request, res: Response): Promise<void> {
const checks: HealthStatus['checks'] = {};
let isHealthy = true;
// Check Redis connection
try {
await this.redisClient.ping();
checks.redis = true;
} catch (error) {
checks.redis = false;
isHealthy = false;
}
// Check database connection (example with PostgreSQL)
try {
// await this.databaseClient.query('SELECT 1');
checks.database = true;
} catch (error) {
checks.database = false;
isHealthy = false;
}
const status: HealthStatus = {
status: isHealthy ? 'healthy' : 'unhealthy',
timestamp: new Date().toISOString(),
uptime: Date.now() - this.startTime,
checks
};
res.status(isHealthy ? 200 : 503).json(status);
}
// Combined health check endpoint
private async healthCheck(req: Request, res: Response): Promise<void> {
await this.readinessProbe(req, res);
}
public setReady(ready: boolean): void {
this.isReady = ready;
}
}
Probe best practices:
- Startup probe: Used during initialization (prevents premature liveness checks)
- Liveness probe: Lightweight check (just verify process is responsive)
- Readiness probe: Comprehensive check (verify all dependencies are healthy)
- Fast response: Health checks should respond in <1 second
Graceful Shutdown: Zero Request Loss
Graceful shutdown ensures in-flight requests complete before the container terminates. This is critical during rolling updates to prevent request failures.
Graceful Shutdown Handler
// src/graceful-shutdown.ts
import { Server } from 'http';
import { Express } from 'express';
export class GracefulShutdownHandler {
private server: Server;
private app: Express;
private isShuttingDown: boolean = false;
private shutdownTimeout: number;
constructor(server: Server, app: Express, shutdownTimeout: number = 30000) {
this.server = server;
this.app = app;
this.shutdownTimeout = shutdownTimeout;
this.registerSignalHandlers();
}
private registerSignalHandlers(): void {
// Handle SIGTERM (Kubernetes sends this during pod termination)
process.on('SIGTERM', () => this.shutdown('SIGTERM'));
// Handle SIGINT (Ctrl+C in terminal)
process.on('SIGINT', () => this.shutdown('SIGINT'));
// Handle uncaught exceptions
process.on('uncaughtException', (error) => {
console.error('Uncaught Exception:', error);
this.shutdown('UNCAUGHT_EXCEPTION');
});
// Handle unhandled promise rejections
process.on('unhandledRejection', (reason, promise) => {
console.error('Unhandled Rejection at:', promise, 'reason:', reason);
this.shutdown('UNHANDLED_REJECTION');
});
}
private async shutdown(signal: string): Promise<void> {
if (this.isShuttingDown) {
console.log('Shutdown already in progress, ignoring signal:', signal);
return;
}
this.isShuttingDown = true;
console.log(`Received ${signal}, starting graceful shutdown...`);
// Step 1: Stop accepting new requests
this.app.get('/health/ready', (req, res) => {
res.status(503).json({
status: 'unhealthy',
message: 'Server is shutting down'
});
});
console.log('Stopped accepting new requests (readiness probe now returns 503)');
// Step 2: Wait for Kubernetes to remove pod from service endpoints
// (typically takes 5-10 seconds)
await this.sleep(10000);
// Step 3: Close HTTP server (wait for in-flight requests to complete)
await new Promise<void>((resolve, reject) => {
const timeout = setTimeout(() => {
reject(new Error('Shutdown timeout exceeded'));
}, this.shutdownTimeout);
this.server.close((error) => {
clearTimeout(timeout);
if (error) {
console.error('Error during server shutdown:', error);
reject(error);
} else {
console.log('All connections closed successfully');
resolve();
}
});
});
// Step 4: Close database connections, Redis clients, etc.
await this.cleanupResources();
console.log('Graceful shutdown complete');
process.exit(0);
}
private async cleanupResources(): Promise<void> {
console.log('Cleaning up resources...');
// Close Redis connection
try {
// await redisClient.quit();
console.log('Redis connection closed');
} catch (error) {
console.error('Error closing Redis:', error);
}
// Close database connection pool
try {
// await databasePool.end();
console.log('Database pool closed');
} catch (error) {
console.error('Error closing database:', error);
}
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// Usage in main application
import express from 'express';
import http from 'http';
const app = express();
const server = http.createServer(app);
// Initialize graceful shutdown handler
new GracefulShutdownHandler(server, app);
server.listen(3000, () => {
console.log('MCP Server listening on port 3000');
});
Shutdown sequence:
- Receive SIGTERM (Kubernetes sends this 30 seconds before SIGKILL)
- Fail readiness probe (stops receiving new traffic)
- Wait 10 seconds (allow load balancer to update routing)
- Close HTTP server (wait for in-flight requests to complete)
- Close database/Redis connections
- Exit gracefully (exit code 0)
Zero-Downtime Updates: Blue-Green Deployment
Blue-green deployment runs two identical production environments (blue = current, green = new). Traffic switches to green after validation, enabling instant rollback if issues arise.
Blue-Green Deployment Script
#!/bin/bash
# blue-green-deploy.sh
set -euo pipefail
NAMESPACE="production"
NEW_VERSION="${1:-latest}"
DEPLOYMENT_NAME="mcp-server"
SERVICE_NAME="mcp-server"
echo "๐ Starting blue-green deployment for version: ${NEW_VERSION}"
# Step 1: Deploy green environment
echo "๐ฆ Deploying green environment..."
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${DEPLOYMENT_NAME}-green
namespace: ${NAMESPACE}
labels:
app: mcp-server
environment: green
spec:
replicas: 3
selector:
matchLabels:
app: mcp-server
environment: green
template:
metadata:
labels:
app: mcp-server
environment: green
version: ${NEW_VERSION}
spec:
containers:
- name: mcp-server
image: gcr.io/your-project/mcp-server:${NEW_VERSION}
ports:
- containerPort: 3000
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
EOF
# Step 2: Wait for green pods to be ready
echo "โณ Waiting for green pods to be ready..."
kubectl wait --for=condition=available --timeout=300s \
deployment/${DEPLOYMENT_NAME}-green -n ${NAMESPACE}
# Step 3: Run smoke tests on green environment
echo "๐งช Running smoke tests on green environment..."
GREEN_POD=$(kubectl get pods -n ${NAMESPACE} -l environment=green -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n ${NAMESPACE} ${GREEN_POD} -- curl -f http://localhost:3000/health || {
echo "โ Smoke tests failed, rolling back..."
kubectl delete deployment ${DEPLOYMENT_NAME}-green -n ${NAMESPACE}
exit 1
}
# Step 4: Switch traffic to green
echo "๐ Switching traffic to green environment..."
kubectl patch service ${SERVICE_NAME} -n ${NAMESPACE} -p '{"spec":{"selector":{"environment":"green"}}}'
echo "โณ Waiting 30 seconds for traffic to stabilize..."
sleep 30
# Step 5: Monitor error rates
echo "๐ Monitoring error rates..."
ERROR_RATE=$(kubectl logs -n ${NAMESPACE} -l environment=green --tail=100 | grep -c "ERROR" || echo "0")
if [ "${ERROR_RATE}" -gt 10 ]; then
echo "โ High error rate detected (${ERROR_RATE} errors), rolling back..."
kubectl patch service ${SERVICE_NAME} -n ${NAMESPACE} -p '{"spec":{"selector":{"environment":"blue"}}}'
kubectl delete deployment ${DEPLOYMENT_NAME}-green -n ${NAMESPACE}
exit 1
fi
# Step 6: Delete old blue environment
echo "๐๏ธ Deleting old blue environment..."
kubectl delete deployment ${DEPLOYMENT_NAME}-blue -n ${NAMESPACE} --ignore-not-found=true
# Step 7: Rename green to blue
echo "๐ Renaming green to blue..."
kubectl patch deployment ${DEPLOYMENT_NAME}-green -n ${NAMESPACE} -p '{"metadata":{"name":"${DEPLOYMENT_NAME}-blue"},"spec":{"selector":{"matchLabels":{"environment":"blue"}},"template":{"metadata":{"labels":{"environment":"blue"}}}}}'
echo "โ
Blue-green deployment complete!"
echo "๐ Current deployment status:"
kubectl get deployments -n ${NAMESPACE} -l app=mcp-server
Blue-green advantages:
- Instant rollback: Switch back to blue if green fails
- Zero downtime: New version fully tested before receiving traffic
- Risk mitigation: Production validation before full cutover
Conclusion: Production-Ready MCP Deployment
Deploying MCP servers to production requires careful planning, robust infrastructure, and comprehensive monitoring. This guide covered production-grade deployment patterns using Docker, Kubernetes, and Cloud Run, complete with health checks, graceful shutdown handling, and zero-downtime update strategies.
Key takeaways:
- Docker multi-stage builds reduce image sizes by 70% and improve security
- Kubernetes orchestration enables auto-scaling, self-healing, and rolling updates
- Cloud Run offers serverless simplicity with auto-scaling to zero
- Health checks ensure traffic routes only to healthy containers
- Graceful shutdown prevents request loss during updates
- Blue-green deployment eliminates downtime and enables instant rollback
Ready to deploy your MCP server to production? Start building your ChatGPT app with MakeAIHQ โ the only no-code platform specifically designed for the ChatGPT App Store. From zero to production deployment in 48 hours, no DevOps expertise required.
Need expert deployment assistance? Contact our team for white-glove migration services, infrastructure consulting, and production optimization.
Internal Links
- Building Production-Grade MCP Servers: Architecture Patterns Guide
- Deployment Specialist: Firebase, Cloud Functions & GCP Best Practices
- MCP Server Monitoring: Prometheus, Grafana & Alerting Setup
- MCP Server Security: Authentication, Authorization & Data Protection
- MCP Server Testing: Unit, Integration & End-to-End Strategies
- MCP Server Performance Optimization: Caching, Load Balancing & CDN
- ChatGPT App Store Submission Checklist: OpenAI Approval Guide
- Kubernetes vs Cloud Run: Choosing the Right Platform for MCP Servers
- Docker Best Practices for Node.js MCP Servers
- MakeAIHQ Features: Build, Deploy & Scale ChatGPT Apps
External Links
- Docker Security Best Practices (Official Documentation)
- Kubernetes Deployment Guide (Official Documentation)
- Google Cloud Run Documentation
Schema Markup (HowTo)
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "MCP Server Deployment: Docker, Kubernetes & Cloud Run Production Setup",
"description": "Production-grade MCP deployment guide: Dockerization, Kubernetes orchestration, Cloud Run scaling, health checks, and zero-downtime updates.",
"image": "https://makeaihq.com/images/mcp-deployment-guide.png",
"totalTime": "PT2H",
"estimatedCost": {
"@type": "MonetaryAmount",
"currency": "USD",
"value": "0"
},
"tool": [
{
"@type": "HowToTool",
"name": "Docker"
},
{
"@type": "HowToTool",
"name": "Kubernetes"
},
{
"@type": "HowToTool",
"name": "Google Cloud Run"
}
],
"step": [
{
"@type": "HowToStep",
"name": "Docker Containerization",
"text": "Create multi-stage Dockerfile with Alpine Linux, non-root user, and security scanning",
"url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#docker-containerization"
},
{
"@type": "HowToStep",
"name": "Kubernetes Deployment",
"text": "Deploy to Kubernetes with auto-scaling, health checks, and rolling updates",
"url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#kubernetes-deployment"
},
{
"@type": "HowToStep",
"name": "Cloud Run Deployment",
"text": "Deploy to Cloud Run with serverless auto-scaling and zero idle costs",
"url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#cloud-run-deployment"
},
{
"@type": "HowToStep",
"name": "Implement Health Checks",
"text": "Add startup, liveness, and readiness probes for container reliability",
"url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#health-checks"
},
{
"@type": "HowToStep",
"name": "Graceful Shutdown",
"text": "Implement graceful shutdown to prevent request loss during updates",
"url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#graceful-shutdown"
},
{
"@type": "HowToStep",
"name": "Blue-Green Deployment",
"text": "Deploy new versions with zero downtime using blue-green strategy",
"url": "https://makeaihq.com/guides/cluster/mcp-server-deployment-patterns#zero-downtime-updates"
}
]
}