Rate Limiting & Quota Management for ChatGPT Apps

Rate limiting and quota management are critical components of production-ready ChatGPT applications. Without proper rate limiting, your app can exceed OpenAI API quotas, incur unexpected costs, or provide a poor user experience. This guide provides production-ready code for implementing robust rate limiting, quota tracking, burst handling, and graceful degradation.

Table of Contents

  1. Understanding OpenAI Rate Limits
  2. Token Bucket Algorithm Implementation
  3. Leaky Bucket Pattern
  4. Quota Tracking System
  5. Burst Handling Strategies
  6. Graceful Degradation Controller
  7. User Tier Management
  8. Production Best Practices

Understanding OpenAI Rate Limits

OpenAI enforces multiple types of rate limits on API requests:

  • Requests Per Minute (RPM): Maximum number of API calls per minute
  • Tokens Per Minute (TPM): Maximum tokens processed per minute
  • Tokens Per Day (TPD): Daily token quota
  • Concurrent Requests: Maximum simultaneous requests

Different tiers have different limits. For example, GPT-4 typically allows 500 RPM and 30,000 TPM for standard accounts, while GPT-3.5-turbo allows 3,500 RPM and 90,000 TPM. Enterprise accounts receive significantly higher quotas.

Understanding these limits is essential for building production ChatGPT apps that scale reliably. Learn more about OpenAI API rate limits and best practices in the official documentation.

Token Bucket Algorithm Implementation

The token bucket algorithm is the gold standard for rate limiting. It allows burst traffic while maintaining average rate limits over time.

How Token Bucket Works

  1. Bucket Capacity: Define maximum tokens (requests) the bucket can hold
  2. Refill Rate: Tokens are added to the bucket at a constant rate
  3. Token Consumption: Each request consumes one or more tokens
  4. Overflow Protection: Tokens don't accumulate beyond bucket capacity

Production Token Bucket Implementation

/**
 * Token Bucket Rate Limiter
 * Implements sliding window token bucket algorithm with Redis backing
 *
 * Features:
 * - Distributed rate limiting across multiple instances
 * - Configurable refill rates and bucket capacities
 * - Support for different user tiers
 * - Atomic operations for thread safety
 *
 * @class TokenBucketRateLimiter
 */
class TokenBucketRateLimiter {
  constructor(redisClient, config = {}) {
    this.redis = redisClient;
    this.config = {
      bucketCapacity: config.bucketCapacity || 100, // Maximum tokens
      refillRate: config.refillRate || 10, // Tokens per second
      refillInterval: config.refillInterval || 1000, // Milliseconds
      keyPrefix: config.keyPrefix || 'rate_limit:',
      ...config
    };
  }

  /**
   * Get bucket key for user
   * @param {string} userId - Unique user identifier
   * @param {string} endpoint - API endpoint being rate limited
   * @returns {string} Redis key
   */
  getBucketKey(userId, endpoint = 'default') {
    return `${this.config.keyPrefix}${userId}:${endpoint}`;
  }

  /**
   * Check if request is allowed and consume tokens
   * @param {string} userId - User identifier
   * @param {number} tokensRequired - Tokens needed for this request
   * @param {string} endpoint - API endpoint
   * @returns {Promise<Object>} { allowed: boolean, remainingTokens: number, retryAfter: number }
   */
  async consume(userId, tokensRequired = 1, endpoint = 'default') {
    const key = this.getBucketKey(userId, endpoint);
    const now = Date.now();

    // Lua script for atomic token bucket operation
    const luaScript = `
      local key = KEYS[1]
      local capacity = tonumber(ARGV[1])
      local refill_rate = tonumber(ARGV[2])
      local refill_interval = tonumber(ARGV[3])
      local tokens_required = tonumber(ARGV[4])
      local now = tonumber(ARGV[5])

      -- Get current bucket state
      local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
      local current_tokens = tonumber(bucket[1]) or capacity
      local last_refill = tonumber(bucket[2]) or now

      -- Calculate tokens to add based on time elapsed
      local time_elapsed = now - last_refill
      local refill_cycles = math.floor(time_elapsed / refill_interval)
      local tokens_to_add = refill_cycles * refill_rate

      -- Refill tokens (up to capacity)
      current_tokens = math.min(capacity, current_tokens + tokens_to_add)

      -- Update last refill time
      local new_last_refill = last_refill + (refill_cycles * refill_interval)

      -- Check if enough tokens available
      if current_tokens >= tokens_required then
        -- Consume tokens
        current_tokens = current_tokens - tokens_required

        -- Update bucket state
        redis.call('HMSET', key, 'tokens', current_tokens, 'last_refill', new_last_refill)
        redis.call('EXPIRE', key, 3600) -- 1 hour TTL

        return {1, current_tokens, 0} -- allowed, remaining, retryAfter
      else
        -- Not enough tokens - calculate retry time
        local tokens_needed = tokens_required - current_tokens
        local refills_needed = math.ceil(tokens_needed / refill_rate)
        local retry_after = refills_needed * refill_interval

        return {0, current_tokens, retry_after} -- not allowed, remaining, retryAfter
      end
    `;

    try {
      const result = await this.redis.eval(
        luaScript,
        1, // Number of keys
        key,
        this.config.bucketCapacity,
        this.config.refillRate,
        this.config.refillInterval,
        tokensRequired,
        now
      );

      return {
        allowed: result[0] === 1,
        remainingTokens: result[1],
        retryAfter: result[2]
      };
    } catch (error) {
      console.error('Token bucket error:', error);
      // Fail open to prevent blocking users on Redis errors
      return { allowed: true, remainingTokens: 0, retryAfter: 0 };
    }
  }

  /**
   * Get current bucket status without consuming tokens
   * @param {string} userId - User identifier
   * @param {string} endpoint - API endpoint
   * @returns {Promise<Object>} { tokens: number, capacity: number, nextRefill: number }
   */
  async getStatus(userId, endpoint = 'default') {
    const key = this.getBucketKey(userId, endpoint);
    const bucket = await this.redis.hmget(key, 'tokens', 'last_refill');

    const currentTokens = parseInt(bucket[0]) || this.config.bucketCapacity;
    const lastRefill = parseInt(bucket[1]) || Date.now();
    const nextRefill = lastRefill + this.config.refillInterval;

    return {
      tokens: currentTokens,
      capacity: this.config.bucketCapacity,
      nextRefill: nextRefill - Date.now()
    };
  }

  /**
   * Reset bucket for user (admin operation)
   * @param {string} userId - User identifier
   * @param {string} endpoint - API endpoint
   */
  async reset(userId, endpoint = 'default') {
    const key = this.getBucketKey(userId, endpoint);
    await this.redis.del(key);
  }
}

module.exports = TokenBucketRateLimiter;

This implementation provides distributed rate limiting with Redis-backed state management for multi-instance deployments.

Leaky Bucket Pattern

The leaky bucket algorithm is ideal for smoothing traffic and preventing sudden bursts. Unlike token bucket, leaky bucket processes requests at a constant rate.

Quota Tracking System

Track usage across multiple dimensions (requests, tokens, costs) to prevent quota overruns and enable accurate billing.

/**
 * Quota Tracking System
 * Monitors and enforces quota limits across multiple dimensions
 *
 * Features:
 * - Multi-dimensional tracking (requests, tokens, cost)
 * - Rolling window calculations
 * - Real-time quota monitoring
 * - Automatic reset on period boundaries
 *
 * @class QuotaTracker
 */
class QuotaTracker {
  constructor(redisClient, config = {}) {
    this.redis = redisClient;
    this.config = {
      keyPrefix: config.keyPrefix || 'quota:',
      periods: config.periods || ['minute', 'hour', 'day', 'month'],
      ...config
    };
  }

  /**
   * Get period boundaries
   * @param {string} period - Time period (minute, hour, day, month)
   * @returns {Object} { start: timestamp, end: timestamp, ttl: seconds }
   */
  getPeriodBoundaries(period) {
    const now = new Date();
    let start, end, ttl;

    switch (period) {
      case 'minute':
        start = new Date(now.getFullYear(), now.getMonth(), now.getDate(),
                         now.getHours(), now.getMinutes(), 0, 0);
        end = new Date(start.getTime() + 60000);
        ttl = 120; // 2 minutes
        break;

      case 'hour':
        start = new Date(now.getFullYear(), now.getMonth(), now.getDate(),
                         now.getHours(), 0, 0, 0);
        end = new Date(start.getTime() + 3600000);
        ttl = 7200; // 2 hours
        break;

      case 'day':
        start = new Date(now.getFullYear(), now.getMonth(), now.getDate(), 0, 0, 0, 0);
        end = new Date(start.getTime() + 86400000);
        ttl = 172800; // 2 days
        break;

      case 'month':
        start = new Date(now.getFullYear(), now.getMonth(), 1, 0, 0, 0, 0);
        end = new Date(now.getFullYear(), now.getMonth() + 1, 1, 0, 0, 0, 0);
        ttl = 5184000; // 60 days
        break;

      default:
        throw new Error(`Invalid period: ${period}`);
    }

    return {
      start: start.getTime(),
      end: end.getTime(),
      ttl,
      key: `${start.getFullYear()}-${String(start.getMonth() + 1).padStart(2, '0')}-${String(start.getDate()).padStart(2, '0')}-${String(start.getHours()).padStart(2, '0')}-${String(start.getMinutes()).padStart(2, '0')}`
    };
  }

  /**
   * Record usage
   * @param {string} userId - User identifier
   * @param {Object} usage - { requests: number, tokens: number, cost: number }
   * @returns {Promise<Object>} Current usage across all periods
   */
  async recordUsage(userId, usage = {}) {
    const { requests = 0, tokens = 0, cost = 0 } = usage;
    const updates = {};

    for (const period of this.config.periods) {
      const boundary = this.getPeriodBoundaries(period);
      const key = `${this.config.keyPrefix}${userId}:${period}:${boundary.key}`;

      // Increment counters atomically
      const pipeline = this.redis.pipeline();

      if (requests > 0) pipeline.hincrby(key, 'requests', requests);
      if (tokens > 0) pipeline.hincrby(key, 'tokens', tokens);
      if (cost > 0) pipeline.hincrbyfloat(key, 'cost', cost);

      pipeline.expire(key, boundary.ttl);

      await pipeline.exec();

      // Get current values
      const current = await this.redis.hgetall(key);
      updates[period] = {
        requests: parseInt(current.requests) || 0,
        tokens: parseInt(current.tokens) || 0,
        cost: parseFloat(current.cost) || 0,
        resetAt: boundary.end
      };
    }

    return updates;
  }

  /**
   * Check quota limits
   * @param {string} userId - User identifier
   * @param {Object} limits - { minute: {...}, hour: {...}, day: {...}, month: {...} }
   * @returns {Promise<Object>} { allowed: boolean, exceeded: [], usage: {} }
   */
  async checkQuota(userId, limits) {
    const usage = {};
    const exceeded = [];

    for (const period of this.config.periods) {
      if (!limits[period]) continue;

      const boundary = this.getPeriodBoundaries(period);
      const key = `${this.config.keyPrefix}${userId}:${period}:${boundary.key}`;
      const current = await this.redis.hgetall(key);

      const periodUsage = {
        requests: parseInt(current.requests) || 0,
        tokens: parseInt(current.tokens) || 0,
        cost: parseFloat(current.cost) || 0,
        resetAt: boundary.end
      };

      usage[period] = periodUsage;

      // Check each limit dimension
      const periodLimits = limits[period];

      if (periodLimits.requests && periodUsage.requests >= periodLimits.requests) {
        exceeded.push({ period, dimension: 'requests', limit: periodLimits.requests, current: periodUsage.requests });
      }

      if (periodLimits.tokens && periodUsage.tokens >= periodLimits.tokens) {
        exceeded.push({ period, dimension: 'tokens', limit: periodLimits.tokens, current: periodUsage.tokens });
      }

      if (periodLimits.cost && periodUsage.cost >= periodLimits.cost) {
        exceeded.push({ period, dimension: 'cost', limit: periodLimits.cost, current: periodUsage.cost });
      }
    }

    return {
      allowed: exceeded.length === 0,
      exceeded,
      usage
    };
  }

  /**
   * Get usage report
   * @param {string} userId - User identifier
   * @returns {Promise<Object>} Usage across all periods
   */
  async getUsageReport(userId) {
    const report = {};

    for (const period of this.config.periods) {
      const boundary = this.getPeriodBoundaries(period);
      const key = `${this.config.keyPrefix}${userId}:${period}:${boundary.key}`;
      const current = await this.redis.hgetall(key);

      report[period] = {
        requests: parseInt(current.requests) || 0,
        tokens: parseInt(current.tokens) || 0,
        cost: parseFloat(current.cost) || 0,
        resetAt: boundary.end
      };
    }

    return report;
  }
}

module.exports = QuotaTracker;

Integrate quota tracking with analytics and monitoring systems for comprehensive usage insights.

Burst Handling Strategies

Handle traffic bursts gracefully while protecting backend services from overload.

/**
 * Burst Handler
 * Manages traffic bursts with queue-based smoothing
 *
 * Features:
 * - Request queuing during bursts
 * - Priority-based processing
 * - Automatic queue overflow protection
 * - Graceful degradation on overload
 *
 * @class BurstHandler
 */
class BurstHandler {
  constructor(config = {}) {
    this.config = {
      maxQueueSize: config.maxQueueSize || 1000,
      maxConcurrent: config.maxConcurrent || 10,
      processingRate: config.processingRate || 100, // ms between requests
      priorityLevels: config.priorityLevels || 3,
      queueTimeout: config.queueTimeout || 30000, // 30 seconds
      ...config
    };

    this.queues = new Map(); // Priority queues
    this.activeRequests = 0;
    this.processing = false;
  }

  /**
   * Enqueue request for processing
   * @param {Function} requestFn - Async function to execute
   * @param {number} priority - Priority level (0 = highest)
   * @param {Object} metadata - Request metadata
   * @returns {Promise} Resolves when request completes
   */
  async enqueue(requestFn, priority = 1, metadata = {}) {
    return new Promise((resolve, reject) => {
      const queueItem = {
        requestFn,
        priority,
        metadata,
        resolve,
        reject,
        enqueuedAt: Date.now(),
        timeout: setTimeout(() => {
          this.removeFromQueue(queueItem);
          reject(new Error('Queue timeout exceeded'));
        }, this.config.queueTimeout)
      };

      // Get or create priority queue
      if (!this.queues.has(priority)) {
        this.queues.set(priority, []);
      }

      const queue = this.queues.get(priority);

      // Check queue overflow
      const totalQueued = Array.from(this.queues.values()).reduce((sum, q) => sum + q.length, 0);

      if (totalQueued >= this.config.maxQueueSize) {
        clearTimeout(queueItem.timeout);
        reject(new Error('Queue overflow - try again later'));
        return;
      }

      // Add to queue
      queue.push(queueItem);

      // Start processing if not already running
      if (!this.processing) {
        this.startProcessing();
      }
    });
  }

  /**
   * Start processing queue
   */
  async startProcessing() {
    this.processing = true;

    while (this.hasQueuedRequests() || this.activeRequests > 0) {
      // Wait if at concurrency limit
      if (this.activeRequests >= this.config.maxConcurrent) {
        await this.sleep(this.config.processingRate);
        continue;
      }

      // Get next request from highest priority queue
      const queueItem = this.dequeue();

      if (!queueItem) {
        await this.sleep(this.config.processingRate);
        continue;
      }

      // Process request
      this.activeRequests++;

      this.processRequest(queueItem)
        .then(result => {
          clearTimeout(queueItem.timeout);
          queueItem.resolve(result);
        })
        .catch(error => {
          clearTimeout(queueItem.timeout);
          queueItem.reject(error);
        })
        .finally(() => {
          this.activeRequests--;
        });

      // Rate limiting between requests
      await this.sleep(this.config.processingRate);
    }

    this.processing = false;
  }

  /**
   * Dequeue next request (highest priority first)
   * @returns {Object|null} Queue item
   */
  dequeue() {
    // Iterate through priority levels (0 = highest)
    for (let p = 0; p < this.config.priorityLevels; p++) {
      const queue = this.queues.get(p);
      if (queue && queue.length > 0) {
        return queue.shift();
      }
    }
    return null;
  }

  /**
   * Process individual request
   * @param {Object} queueItem - Queue item to process
   */
  async processRequest(queueItem) {
    const { requestFn, metadata } = queueItem;

    try {
      const result = await requestFn();

      // Track metrics
      const waitTime = Date.now() - queueItem.enqueuedAt;
      this.recordMetrics({
        waitTime,
        priority: queueItem.priority,
        success: true,
        ...metadata
      });

      return result;
    } catch (error) {
      this.recordMetrics({
        waitTime: Date.now() - queueItem.enqueuedAt,
        priority: queueItem.priority,
        success: false,
        error: error.message,
        ...metadata
      });

      throw error;
    }
  }

  /**
   * Check if any requests are queued
   */
  hasQueuedRequests() {
    for (const queue of this.queues.values()) {
      if (queue.length > 0) return true;
    }
    return false;
  }

  /**
   * Remove item from queue
   */
  removeFromQueue(queueItem) {
    const queue = this.queues.get(queueItem.priority);
    if (queue) {
      const index = queue.indexOf(queueItem);
      if (index > -1) queue.splice(index, 1);
    }
  }

  /**
   * Record metrics (implement based on your metrics system)
   */
  recordMetrics(metrics) {
    // Integrate with your monitoring system
    console.log('Burst metrics:', metrics);
  }

  /**
   * Sleep utility
   */
  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  /**
   * Get queue statistics
   */
  getStats() {
    const stats = {
      activeRequests: this.activeRequests,
      totalQueued: 0,
      byPriority: {}
    };

    for (const [priority, queue] of this.queues.entries()) {
      stats.byPriority[priority] = queue.length;
      stats.totalQueued += queue.length;
    }

    return stats;
  }
}

module.exports = BurstHandler;

Graceful Degradation Controller

Implement graceful degradation to maintain service availability when quotas are exceeded.

/**
 * Graceful Degradation Controller
 * Manages service degradation based on quota status
 *
 * Features:
 * - Tiered degradation levels
 * - Feature toggling based on quota
 * - Automatic recovery when quota available
 * - User experience optimization during limits
 *
 * @class DegradationController
 */
class DegradationController {
  constructor(quotaTracker, config = {}) {
    this.quotaTracker = quotaTracker;
    this.config = {
      degradationLevels: config.degradationLevels || [
        { threshold: 0.9, level: 'warning', actions: ['reduce_quality'] },
        { threshold: 0.95, level: 'critical', actions: ['reduce_quality', 'disable_features'] },
        { threshold: 1.0, level: 'blocked', actions: ['queue_requests', 'show_limits'] }
      ],
      ...config
    };

    this.currentLevel = 'normal';
    this.disabledFeatures = new Set();
  }

  /**
   * Evaluate degradation level based on quota usage
   * @param {string} userId - User identifier
   * @param {Object} limits - User quota limits
   * @returns {Promise<Object>} { level: string, actions: [], usage: {} }
   */
  async evaluateDegradation(userId, limits) {
    const quotaStatus = await this.quotaTracker.checkQuota(userId, limits);

    // Calculate maximum usage percentage across all dimensions
    let maxUsagePercent = 0;

    for (const [period, periodUsage] of Object.entries(quotaStatus.usage)) {
      if (!limits[period]) continue;

      const periodLimits = limits[period];

      if (periodLimits.requests) {
        const percent = periodUsage.requests / periodLimits.requests;
        maxUsagePercent = Math.max(maxUsagePercent, percent);
      }

      if (periodLimits.tokens) {
        const percent = periodUsage.tokens / periodLimits.tokens;
        maxUsagePercent = Math.max(maxUsagePercent, percent);
      }
    }

    // Determine degradation level
    let degradationLevel = 'normal';
    let actions = [];

    for (const level of this.config.degradationLevels) {
      if (maxUsagePercent >= level.threshold) {
        degradationLevel = level.level;
        actions = level.actions;
      }
    }

    this.currentLevel = degradationLevel;

    return {
      level: degradationLevel,
      actions,
      usage: quotaStatus.usage,
      usagePercent: maxUsagePercent * 100
    };
  }

  /**
   * Apply degradation actions
   * @param {Array} actions - Degradation actions to apply
   * @param {Object} requestContext - Current request context
   * @returns {Object} Modified request context
   */
  applyDegradation(actions, requestContext) {
    const modifiedContext = { ...requestContext };

    for (const action of actions) {
      switch (action) {
        case 'reduce_quality':
          // Use faster, cheaper model
          if (modifiedContext.model === 'gpt-4') {
            modifiedContext.model = 'gpt-3.5-turbo';
            modifiedContext.degraded = true;
            modifiedContext.degradationReason = 'Quota limit approaching - using optimized model';
          }
          break;

        case 'disable_features':
          // Disable non-essential features
          this.disabledFeatures.add('streaming');
          this.disabledFeatures.add('function_calling');
          modifiedContext.stream = false;
          modifiedContext.functions = null;
          modifiedContext.degraded = true;
          break;

        case 'queue_requests':
          // Add to queue instead of immediate processing
          modifiedContext.queued = true;
          modifiedContext.estimatedWait = this.estimateQueueTime();
          break;

        case 'show_limits':
          // Return quota information to user
          modifiedContext.showQuotaWarning = true;
          modifiedContext.quotaMessage = this.getQuotaMessage();
          break;

        default:
          console.warn(`Unknown degradation action: ${action}`);
      }
    }

    return modifiedContext;
  }

  /**
   * Check if feature is available
   * @param {string} feature - Feature name
   * @returns {boolean} True if feature is enabled
   */
  isFeatureEnabled(feature) {
    return !this.disabledFeatures.has(feature);
  }

  /**
   * Estimate queue wait time
   */
  estimateQueueTime() {
    // Implement based on your queue metrics
    return 30000; // 30 seconds default
  }

  /**
   * Get user-friendly quota message
   */
  getQuotaMessage() {
    switch (this.currentLevel) {
      case 'warning':
        return 'You are approaching your quota limit. Consider upgrading your plan for uninterrupted service.';

      case 'critical':
        return 'You are very close to your quota limit. Some features have been temporarily disabled.';

      case 'blocked':
        return 'You have reached your quota limit. Please upgrade your plan or wait for the quota to reset.';

      default:
        return null;
    }
  }

  /**
   * Reset degradation (when quota available)
   */
  reset() {
    this.currentLevel = 'normal';
    this.disabledFeatures.clear();
  }
}

module.exports = DegradationController;

Learn more about error handling and resilience patterns for production applications.

User Tier Management

Implement tiered rate limiting based on subscription levels.

/**
 * User Tier Manager
 * Manages rate limits and quotas based on subscription tier
 *
 * @class TierManager
 */
class TierManager {
  constructor() {
    this.tiers = {
      free: {
        name: 'Free',
        limits: {
          minute: { requests: 3, tokens: 1000 },
          hour: { requests: 20, tokens: 10000 },
          day: { requests: 100, tokens: 50000 },
          month: { requests: 1000, tokens: 1000000, cost: 5 }
        },
        features: ['basic_chat'],
        rateLimiter: { bucketCapacity: 5, refillRate: 1 }
      },

      starter: {
        name: 'Starter',
        limits: {
          minute: { requests: 20, tokens: 5000 },
          hour: { requests: 200, tokens: 100000 },
          day: { requests: 2000, tokens: 500000 },
          month: { requests: 10000, tokens: 10000000, cost: 50 }
        },
        features: ['basic_chat', 'streaming', 'templates'],
        rateLimiter: { bucketCapacity: 30, refillRate: 5 }
      },

      professional: {
        name: 'Professional',
        limits: {
          minute: { requests: 60, tokens: 20000 },
          hour: { requests: 1000, tokens: 500000 },
          day: { requests: 10000, tokens: 2000000 },
          month: { requests: 50000, tokens: 50000000, cost: 200 }
        },
        features: ['basic_chat', 'streaming', 'templates', 'function_calling', 'custom_domain'],
        rateLimiter: { bucketCapacity: 100, refillRate: 20 }
      },

      business: {
        name: 'Business',
        limits: {
          minute: { requests: 200, tokens: 50000 },
          hour: { requests: 5000, tokens: 2000000 },
          day: { requests: 50000, tokens: 10000000 },
          month: { requests: 200000, tokens: 200000000, cost: 1000 }
        },
        features: ['basic_chat', 'streaming', 'templates', 'function_calling', 'custom_domain', 'api_access', 'priority_support'],
        rateLimiter: { bucketCapacity: 300, refillRate: 50 }
      }
    };
  }

  /**
   * Get tier configuration
   * @param {string} tierName - Tier name (free, starter, professional, business)
   * @returns {Object} Tier configuration
   */
  getTier(tierName) {
    const tier = this.tiers[tierName.toLowerCase()];
    if (!tier) {
      throw new Error(`Invalid tier: ${tierName}`);
    }
    return tier;
  }

  /**
   * Get user's tier from database
   * @param {string} userId - User identifier
   * @returns {Promise<Object>} User's tier configuration
   */
  async getUserTier(userId) {
    // Implement database lookup
    // For example:
    // const user = await db.users.findById(userId);
    // return this.getTier(user.subscriptionTier);

    return this.getTier('free'); // Default
  }

  /**
   * Check if user has feature access
   * @param {string} userId - User identifier
   * @param {string} feature - Feature name
   * @returns {Promise<boolean>} True if user has access
   */
  async hasFeatureAccess(userId, feature) {
    const tier = await this.getUserTier(userId);
    return tier.features.includes(feature);
  }

  /**
   * Get rate limiter config for user's tier
   * @param {string} userId - User identifier
   * @returns {Promise<Object>} Rate limiter configuration
   */
  async getRateLimiterConfig(userId) {
    const tier = await this.getUserTier(userId);
    return tier.rateLimiter;
  }
}

module.exports = TierManager;

Integrate tier management with Stripe subscription management for automated quota updates.

Production Best Practices

1. Monitor Rate Limit Headers

Always inspect OpenAI API response headers for rate limit information:

const response = await openai.chat.completions.create({...});

// Check headers
const remaining = response.headers['x-ratelimit-remaining-requests'];
const resetTime = response.headers['x-ratelimit-reset-requests'];

console.log(`Remaining requests: ${remaining}`);
console.log(`Reset at: ${new Date(resetTime)}`);

2. Implement Exponential Backoff

When rate limited, implement exponential backoff with jitter:

async function retryWithBackoff(fn, maxRetries = 5) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        const delay = Math.min(1000 * Math.pow(2, i), 32000);
        const jitter = Math.random() * 1000;
        await new Promise(resolve => setTimeout(resolve, delay + jitter));
        continue;
      }
      throw error;
    }
  }
}

3. Cache Responses

Reduce API calls by caching responses for identical requests. Learn more about caching strategies for ChatGPT apps.

4. Use Streaming for Better UX

Streaming reduces perceived latency and provides a better user experience during rate limiting. See our guide on streaming responses in ChatGPT apps.

5. Implement Circuit Breakers

Prevent cascading failures with circuit breaker patterns. Read about circuit breaker implementation for ChatGPT apps.

6. Monitor and Alert

Set up monitoring and alerting for quota usage:

  • Alert at 70% quota usage (warning)
  • Alert at 90% quota usage (critical)
  • Alert on rate limit errors (429 responses)
  • Track cost per user and per endpoint

Integrate with comprehensive monitoring systems for production readiness.

7. Test Under Load

Perform load testing to validate rate limiting behavior:

# Load test with Apache Bench
ab -n 1000 -c 10 https://your-api.com/chat

# Or use k6 for advanced scenarios
k6 run load-test.js

8. Document Limits for Users

Clearly communicate rate limits and quotas in your documentation. Users should understand:

  • Requests per minute/hour/day limits
  • Token quotas
  • What happens when limits are exceeded
  • How to upgrade for higher limits

See our pricing page for examples of clear quota communication.

Related Resources

Conclusion

Effective rate limiting and quota management are essential for production ChatGPT applications. By implementing token bucket algorithms, quota tracking, burst handling, and graceful degradation, you can build resilient applications that provide excellent user experiences even under quota constraints.

The code examples in this article provide production-ready implementations that you can adapt to your specific needs. Remember to monitor usage, test under load, and communicate limits clearly to your users.

Ready to build production-ready ChatGPT apps without worrying about rate limiting complexity? Try MakeAIHQ.com and deploy your ChatGPT app with built-in rate limiting, quota management, and tier-based controls in minutes.


About MakeAIHQ.com

MakeAIHQ.com is the easiest way to build and deploy ChatGPT apps without coding. Our platform handles rate limiting, quota management, and scaling automatically, so you can focus on creating great user experiences. Start your free trial today.