MCP Server Rate Limiting: Prevent Abuse & DDoS Attacks 2026
Rate limiting is a critical security mechanism for Model Context Protocol (MCP) servers exposed to ChatGPT's 800 million weekly users. Without proper rate limiting, your MCP server becomes vulnerable to abuse, resource exhaustion, and distributed denial-of-service (DDoS) attacks. A single malicious actor can overwhelm your infrastructure, causing service degradation for legitimate users.
Effective rate limiting balances two competing priorities: fair usage and security. Fair usage ensures paying customers receive guaranteed API capacity, while security protects against brute-force attacks, credential stuffing, and automated scraping. The challenge is implementing rate limits that block attackers without impacting real users—especially critical for MCP servers handling ChatGPT conversations where model retries can trigger false positives.
In this guide, we'll implement production-grade rate limiting using token bucket algorithms, Redis-backed distributed limiting, and adaptive strategies that scale with your MCP server's growth. You'll learn battle-tested patterns used by API-first companies like Stripe and GitHub.
Rate Limiting Algorithms
Choosing the right rate limiting algorithm determines how your MCP server handles traffic spikes and enforces quotas. Each algorithm offers different tradeoffs between accuracy, performance, and memory usage.
Token Bucket Algorithm
The token bucket algorithm is the gold standard for API rate limiting. It works like a physical bucket that fills with tokens at a fixed rate (e.g., 10 tokens per second). Each API request consumes one token. When the bucket is empty, requests are rejected until tokens refill.
Advantages:
- Allows controlled bursts (bucket size > refill rate)
- Smooth traffic distribution over time
- Simple to implement and understand
- Memory-efficient (stores only 2 values: tokens and timestamp)
Implementation:
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity; // Max tokens (e.g., 100)
this.refillRate = refillRate; // Tokens per second (e.g., 10)
this.tokens = capacity; // Start full
this.lastRefill = Date.now();
}
tryConsume(tokensNeeded = 1) {
this.refill();
if (this.tokens >= tokensNeeded) {
this.tokens -= tokensNeeded;
return true;
}
return false;
}
refill() {
const now = Date.now();
const elapsedSeconds = (now - this.lastRefill) / 1000;
const tokensToAdd = elapsedSeconds * this.refillRate;
this.tokens = Math.min(
this.capacity,
this.tokens + tokensToAdd
);
this.lastRefill = now;
}
getRetryAfter() {
const tokensNeeded = 1 - this.tokens;
return Math.ceil(tokensNeeded / this.refillRate);
}
}
// Usage in MCP server
const userBuckets = new Map();
function rateLimitMiddleware(req, res, next) {
const userId = req.auth.userId;
if (!userBuckets.has(userId)) {
// Professional tier: 50 requests/minute (100 burst)
userBuckets.set(userId, new TokenBucket(100, 50/60));
}
const bucket = userBuckets.get(userId);
if (bucket.tryConsume()) {
res.set('X-RateLimit-Remaining', Math.floor(bucket.tokens));
next();
} else {
res.set('Retry-After', bucket.getRetryAfter());
res.status(429).json({
error: 'Too many requests',
retryAfter: bucket.getRetryAfter()
});
}
}
Sliding Window Log
The sliding window log algorithm tracks exact timestamps of recent requests in a time-ordered list. It provides the most accurate rate limiting but requires more memory.
Use case: High-value enterprise endpoints where precision matters more than performance.
Fixed Window Counter
The fixed window counter increments a counter for each time window (e.g., 00:00-00:59). It's fast but suffers from the "boundary problem"—users can send 2x limit at window boundaries.
Use case: Non-critical endpoints where approximate limits are acceptable.
Leaky Bucket
The leaky bucket algorithm processes requests at a constant rate, queueing excess requests. Unlike token bucket, it enforces strict output rate (no bursts).
Use case: Rate limiting for downstream APIs with strict throughput limits.
For most MCP servers, token bucket is the optimal choice—it handles ChatGPT's bursty retry behavior while preventing sustained abuse.
Implementation with Express.js and Redis
Production MCP servers require distributed rate limiting across multiple instances. Using Redis as a shared state store ensures consistent limits regardless of which server handles the request.
Express-Rate-Limit Middleware
The express-rate-limit package provides production-ready rate limiting with minimal configuration:
import rateLimit from 'express-rate-limit';
import RedisStore from 'rate-limit-redis';
import { createClient } from 'redis';
// Redis client for distributed rate limiting
const redisClient = createClient({
host: process.env.REDIS_HOST,
port: 6379,
password: process.env.REDIS_PASSWORD
});
await redisClient.connect();
// Per-user rate limiter (requires authentication)
const userLimiter = rateLimit({
store: new RedisStore({
client: redisClient,
prefix: 'ratelimit:user:'
}),
windowMs: 60 * 1000, // 1 minute window
max: async (req) => {
// Tier-based limits
const tier = req.user?.subscriptionTier || 'free';
const limits = {
free: 10,
starter: 50,
professional: 200,
business: 1000
};
return limits[tier];
},
keyGenerator: (req) => req.user?.id || req.ip,
handler: (req, res) => {
res.status(429).json({
error: 'Rate limit exceeded',
message: `Your ${req.user?.subscriptionTier || 'free'} tier allows ${res.getHeader('X-RateLimit-Limit')} requests per minute`,
upgradeUrl: 'https://makeaihq.com/pricing'
});
},
standardHeaders: true, // X-RateLimit-* headers
legacyHeaders: false
});
// Apply to MCP tool routes
app.use('/mcp/tools', userLimiter);
Redis-Backed Distributed Limiter
For fine-grained control, implement a custom Redis-backed limiter using token bucket:
class RedisTokenBucket {
constructor(redisClient, capacity, refillRate) {
this.redis = redisClient;
this.capacity = capacity;
this.refillRate = refillRate;
}
async tryConsume(key, tokensNeeded = 1) {
const now = Date.now();
const bucketKey = `bucket:${key}`;
// Lua script for atomic token bucket operation
const script = `
local capacity = tonumber(ARGV[1])
local refillRate = tonumber(ARGV[2])
local tokensNeeded = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local bucket = redis.call('HMGET', KEYS[1], 'tokens', 'lastRefill')
local tokens = tonumber(bucket[1]) or capacity
local lastRefill = tonumber(bucket[2]) or now
-- Refill tokens
local elapsedSeconds = (now - lastRefill) / 1000
local tokensToAdd = elapsedSeconds * refillRate
tokens = math.min(capacity, tokens + tokensToAdd)
-- Try to consume
if tokens >= tokensNeeded then
tokens = tokens - tokensNeeded
redis.call('HMSET', KEYS[1], 'tokens', tokens, 'lastRefill', now)
redis.call('EXPIRE', KEYS[1], 3600)
return {1, tokens}
else
return {0, tokens}
end
`;
const result = await this.redis.eval(script, {
keys: [bucketKey],
arguments: [
this.capacity.toString(),
this.refillRate.toString(),
tokensNeeded.toString(),
now.toString()
]
});
return {
allowed: result[0] === 1,
remaining: Math.floor(result[1])
};
}
}
// Usage in route handler
app.post('/mcp/tools/search', async (req, res, next) => {
const limiter = new RedisTokenBucket(redisClient, 100, 50/60);
const result = await limiter.tryConsume(req.user.id);
if (!result.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
remaining: result.remaining
});
}
res.set('X-RateLimit-Remaining', result.remaining);
next();
});
Per-User vs Per-IP Limits
Implement layered rate limiting for defense in depth:
// Layer 1: Global IP-based protection (DDoS prevention)
const globalLimiter = rateLimit({
windowMs: 60 * 1000,
max: 500, // 500 req/min per IP
keyGenerator: (req) => req.ip,
message: 'Too many requests from this IP'
});
// Layer 2: Per-user limits (authenticated endpoints)
const userLimiter = rateLimit({
windowMs: 60 * 1000,
max: (req) => getTierLimit(req.user?.tier),
keyGenerator: (req) => req.user?.id,
skip: (req) => !req.user // Skip if unauthenticated
});
// Apply both
app.use('/mcp', globalLimiter);
app.use('/mcp/tools', userLimiter);
Advanced Rate Limiting Patterns
Adaptive Rate Limiting
Dynamically adjust limits based on server load and user behavior:
import os from 'os';
async function adaptiveRateLimit(req, res, next) {
const cpuLoad = os.loadavg()[0];
const baseLimit = 100;
// Reduce limits when CPU > 70%
const cpuThreshold = os.cpus().length * 0.7;
const multiplier = cpuLoad > cpuThreshold
? 0.5 // Cut limits in half
: 1.0;
req.rateLimit = {
max: Math.floor(baseLimit * multiplier)
};
next();
}
Priority Queues for Enterprise Users
Give enterprise customers guaranteed capacity:
const priorityLimiter = rateLimit({
max: async (req) => {
const tier = req.user?.tier;
// Enterprise gets reserved capacity
if (tier === 'enterprise') {
return 10000; // Effectively unlimited
}
return getTierLimit(tier);
},
skip: (req) => req.user?.tier === 'enterprise' && req.priority === 'high'
});
Rate Limit Headers
Implement IETF draft standard headers for client-side retry logic:
function setRateLimitHeaders(req, res, next) {
const limit = res.locals.rateLimit;
res.set({
'X-RateLimit-Limit': limit.limit,
'X-RateLimit-Remaining': limit.remaining,
'X-RateLimit-Reset': limit.resetTime,
'RateLimit-Policy': `${limit.limit};w=60` // IETF standard
});
next();
}
Graceful Degradation
Instead of hard rejections, queue requests during spikes:
import Queue from 'bull';
const requestQueue = new Queue('mcp-requests', {
redis: { host: 'localhost', port: 6379 }
});
async function queuedRateLimiter(req, res, next) {
const bucket = getUserBucket(req.user.id);
if (bucket.tryConsume()) {
return next();
}
// Queue instead of rejecting
const job = await requestQueue.add({
userId: req.user.id,
endpoint: req.path,
payload: req.body
}, {
delay: bucket.getRetryAfter() * 1000
});
res.status(202).json({
message: 'Request queued',
jobId: job.id,
estimatedProcessing: bucket.getRetryAfter()
});
}
Monitoring and Alerting
Track rate limit metrics to detect abuse and optimize quotas:
import prometheus from 'prom-client';
// Metrics
const rateLimitCounter = new prometheus.Counter({
name: 'mcp_rate_limit_exceeded_total',
help: 'Total rate limit violations',
labelNames: ['userId', 'tier', 'endpoint']
});
const quotaUsageGauge = new prometheus.Gauge({
name: 'mcp_quota_usage_percent',
help: 'Quota usage percentage',
labelNames: ['userId', 'tier']
});
// Monitor middleware
function metricsMiddleware(req, res, next) {
const originalJson = res.json.bind(res);
res.json = function(data) {
if (res.statusCode === 429) {
rateLimitCounter.inc({
userId: req.user?.id || 'anonymous',
tier: req.user?.tier || 'free',
endpoint: req.path
});
}
// Track quota usage
const limit = parseInt(res.get('X-RateLimit-Limit') || 0);
const remaining = parseInt(res.get('X-RateLimit-Remaining') || 0);
if (limit > 0) {
const usagePercent = ((limit - remaining) / limit) * 100;
quotaUsageGauge.set({
userId: req.user?.id,
tier: req.user?.tier
}, usagePercent);
}
return originalJson(data);
};
next();
}
// Alert configuration (PagerDuty/Slack)
setInterval(async () => {
const violations = await rateLimitCounter.get();
// Alert if >100 violations in 5 minutes
if (violations.values.length > 100) {
await sendAlert({
severity: 'warning',
message: `High rate limit violations: ${violations.values.length}`,
details: violations.values.slice(0, 10)
});
}
}, 5 * 60 * 1000);
Conclusion
Rate limiting is your MCP server's first line of defense against abuse and resource exhaustion. By implementing token bucket algorithms with Redis-backed distributed state, you ensure fair usage for legitimate users while blocking malicious actors. Start with conservative limits (10-50 requests/minute for free tier) and increase gradually based on metrics. Monitor quota usage dashboards to identify abuse patterns and adjust limits before they impact service quality.
For a complete MCP server security implementation, read our ChatGPT App Security Complete Guide covering authentication, authorization, and data protection. To deploy your rate-limited MCP server, follow our MCP Server Development Complete Guide.
Internal Links:
- MCP Server Development Complete Guide
- ChatGPT App Security Complete Guide
- MCP Server Authentication with OAuth 2.1
- Redis Caching for MCP Server Performance
- MCP Server Monitoring with Prometheus
- Express.js Middleware Best Practices
- Build ChatGPT Apps Without Code - MakeAIHQ Platform
External Links:
Related Topics:
- Token bucket algorithm implementation
- Redis distributed rate limiting
- DDoS protection strategies
- API quota management
- Express.js security middleware
- Prometheus metrics for APIs
- ChatGPT model retry handling