Serverless Architecture Patterns for ChatGPT Apps: AWS Lambda, Cloud Functions & Step Functions

Building ChatGPT applications with serverless architecture provides unmatched scalability, cost efficiency, and operational simplicity. Unlike traditional server-based deployments, serverless platforms like AWS Lambda, Google Cloud Functions, and Azure Functions automatically scale from zero to millions of requests without infrastructure management. This matters for ChatGPT apps because conversational AI workloads are inherently unpredictable—users might send one message or launch a thousand concurrent conversations.

Serverless architectures follow a pay-per-use model where you're charged only for actual execution time, making them ideal for ChatGPT applications with variable traffic patterns. A fitness studio chatbot might handle 10 conversations during weekdays but 500 on Monday mornings when members schedule classes. With serverless, your infrastructure scales automatically without paying for idle capacity.

The key advantages for ChatGPT applications include:

  • Auto-scaling: Handle sudden conversation spikes without manual intervention
  • Cost efficiency: Pay only when users interact with your chatbot (per 100ms of execution)
  • Zero infrastructure: Focus on conversation logic, not server maintenance
  • Global deployment: Replicate functions across regions for low latency
  • Event-driven architecture: Trigger conversations from webhooks, queues, or scheduled events

This guide provides 7+ production-ready serverless patterns specifically designed for ChatGPT applications, covering AWS Lambda, Google Cloud Functions, Azure Functions, orchestration with Step Functions, and cold start optimization techniques used by companies serving millions of ChatGPT conversations.

For a comprehensive overview of ChatGPT application development, see our ChatGPT Applications Development Guide.


Serverless Architecture Patterns for ChatGPT Apps

Event-Driven Architecture

Serverless ChatGPT applications thrive on event-driven patterns where user messages trigger Lambda functions, which invoke OpenAI's API and return responses. This decouples conversation handling from your frontend, enabling asynchronous processing, queuing, and retry logic.

Core patterns:

  1. API Gateway + Lambda: Synchronous HTTP requests for real-time conversations
  2. Queue-based (SQS/Pub/Sub): Asynchronous processing for complex multi-turn conversations
  3. Step Functions orchestration: Multi-step workflows (e.g., message → moderation → ChatGPT → store → respond)
  4. Event buses (EventBridge): Fan-out conversations to multiple services (analytics, CRM, notifications)

API Gateway + Lambda Pattern

The most common pattern routes HTTPS requests through API Gateway to Lambda functions. This provides authentication, rate limiting, and CORS without custom middleware:

User → API Gateway → Lambda → OpenAI API → Response

For scalable API designs, see API Gateway Patterns for ChatGPT Apps.

Cold Start Mitigation

Cold starts (300ms-3s delay when Lambda initializes) can disrupt real-time conversations. Strategies include:

  • Provisioned concurrency: Keep Lambda instances warm (costs ~$0.015/hour per instance)
  • Function warmers: Scheduled pings every 5 minutes
  • Lazy loading: Import heavy dependencies (OpenAI SDK, LangChain) only when needed
  • Smaller packages: Use esbuild/webpack to bundle dependencies under 10MB

For event-driven ChatGPT architectures, explore Event-Driven Architecture for ChatGPT Apps.


AWS Lambda Patterns for ChatGPT Applications

AWS Lambda dominates serverless ChatGPT deployments due to its ecosystem (API Gateway, Step Functions, DynamoDB) and pricing ($0.20 per 1M requests). Here's a production-ready Lambda function handling ChatGPT conversations with error handling, streaming responses, and DynamoDB conversation storage.

Production Lambda Function (Node.js)

// lambda/chatgpt-handler/index.js
// Production AWS Lambda function for ChatGPT conversations
// Features: OpenAI streaming, DynamoDB storage, error handling, CloudWatch logs

const { DynamoDBClient } = require('@aws-sdk/client-dynamodb');
const { DynamoDBDocumentClient, PutCommand, QueryCommand } = require('@aws-sdk/lib-dynamodb');
const OpenAI = require('openai');

// Initialize clients outside handler for connection reuse (reduces cold starts)
const dynamoClient = new DynamoDBClient({ region: process.env.AWS_REGION });
const docClient = DynamoDBDocumentClient.from(dynamoClient);
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// Environment variables (set in Lambda console or Terraform)
const CONVERSATIONS_TABLE = process.env.CONVERSATIONS_TABLE; // DynamoDB table
const MODEL = process.env.OPENAI_MODEL || 'gpt-4-turbo-preview';
const MAX_TOKENS = parseInt(process.env.MAX_TOKENS) || 500;
const TEMPERATURE = parseFloat(process.env.TEMPERATURE) || 0.7;

exports.handler = async (event) => {
  console.log('Received event:', JSON.stringify(event, null, 2));

  try {
    // Parse request body
    const body = JSON.parse(event.body || '{}');
    const { userId, message, conversationId, systemPrompt } = body;

    // Validation
    if (!userId || !message) {
      return {
        statusCode: 400,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ error: 'userId and message are required' })
      };
    }

    // Generate conversation ID if not provided
    const convId = conversationId || `conv_${Date.now()}_${userId}`;

    // Retrieve conversation history from DynamoDB
    const historyResponse = await docClient.send(new QueryCommand({
      TableName: CONVERSATIONS_TABLE,
      KeyConditionExpression: 'conversationId = :convId',
      ExpressionAttributeValues: { ':convId': convId },
      ScanIndexForward: true, // Oldest first
      Limit: 20 // Last 20 messages (10 turns)
    }));

    // Build messages array for OpenAI
    const messages = [
      {
        role: 'system',
        content: systemPrompt || 'You are a helpful AI assistant for a ChatGPT app.'
      }
    ];

    // Add conversation history
    if (historyResponse.Items && historyResponse.Items.length > 0) {
      historyResponse.Items.forEach(item => {
        messages.push({ role: item.role, content: item.content });
      });
    }

    // Add current user message
    messages.push({ role: 'user', content: message });

    // Call OpenAI API with streaming disabled for Lambda (use API Gateway WebSocket for streaming)
    const startTime = Date.now();
    const completion = await openai.chat.completions.create({
      model: MODEL,
      messages: messages,
      max_tokens: MAX_TOKENS,
      temperature: TEMPERATURE,
      stream: false // Lambda doesn't support HTTP streaming without WebSocket API
    });

    const assistantMessage = completion.choices[0].message.content;
    const responseTime = Date.now() - startTime;

    console.log(`OpenAI response received in ${responseTime}ms`);

    // Store user message in DynamoDB
    await docClient.send(new PutCommand({
      TableName: CONVERSATIONS_TABLE,
      Item: {
        conversationId: convId,
        timestamp: Date.now(),
        messageId: `msg_${Date.now()}_user`,
        userId: userId,
        role: 'user',
        content: message,
        ttl: Math.floor(Date.now() / 1000) + (30 * 24 * 60 * 60) // 30 days TTL
      }
    }));

    // Store assistant response in DynamoDB
    await docClient.send(new PutCommand({
      TableName: CONVERSATIONS_TABLE,
      Item: {
        conversationId: convId,
        timestamp: Date.now() + 1, // Ensure ordering after user message
        messageId: `msg_${Date.now()}_assistant`,
        userId: userId,
        role: 'assistant',
        content: assistantMessage,
        model: MODEL,
        tokensUsed: completion.usage.total_tokens,
        responseTime: responseTime,
        ttl: Math.floor(Date.now() / 1000) + (30 * 24 * 60 * 60)
      }
    }));

    // Return successful response
    return {
      statusCode: 200,
      headers: {
        'Content-Type': 'application/json',
        'Access-Control-Allow-Origin': '*', // Configure CORS
        'Access-Control-Allow-Headers': 'Content-Type,Authorization'
      },
      body: JSON.stringify({
        conversationId: convId,
        message: assistantMessage,
        tokensUsed: completion.usage.total_tokens,
        responseTime: responseTime,
        model: MODEL
      })
    };

  } catch (error) {
    console.error('Error processing ChatGPT request:', error);

    // Handle specific OpenAI errors
    if (error.status === 429) {
      return {
        statusCode: 429,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ error: 'Rate limit exceeded. Please try again later.' })
      };
    }

    if (error.status === 401) {
      return {
        statusCode: 500,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ error: 'OpenAI API key invalid. Contact support.' })
      };
    }

    // Generic error response
    return {
      statusCode: 500,
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        error: 'Internal server error processing conversation',
        requestId: event.requestContext?.requestId
      })
    };
  }
};

API Gateway Integration (Terraform)

Infrastructure-as-code for API Gateway + Lambda setup:

# terraform/api-gateway.tf
# API Gateway + Lambda integration for ChatGPT serverless app
# Creates REST API with CORS, authentication, rate limiting

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# API Gateway REST API
resource "aws_api_gateway_rest_api" "chatgpt_api" {
  name        = "chatgpt-serverless-api"
  description = "Serverless ChatGPT conversation API"

  endpoint_configuration {
    types = ["REGIONAL"] # Use EDGE for global distribution
  }
}

# /chat resource
resource "aws_api_gateway_resource" "chat" {
  rest_api_id = aws_api_gateway_rest_api.chatgpt_api.id
  parent_id   = aws_api_gateway_rest_api.chatgpt_api.root_resource_id
  path_part   = "chat"
}

# POST /chat method
resource "aws_api_gateway_method" "chat_post" {
  rest_api_id   = aws_api_gateway_rest_api.chatgpt_api.id
  resource_id   = aws_api_gateway_resource.chat.id
  http_method   = "POST"
  authorization = "NONE" # Use "AWS_IAM" or Cognito for production

  request_parameters = {
    "method.request.header.Content-Type" = true
  }
}

# Lambda integration
resource "aws_api_gateway_integration" "lambda_integration" {
  rest_api_id             = aws_api_gateway_rest_api.chatgpt_api.id
  resource_id             = aws_api_gateway_resource.chat.id
  http_method             = aws_api_gateway_method.chat_post.http_method
  integration_http_method = "POST"
  type                    = "AWS_PROXY" # Proxy mode passes full request to Lambda
  uri                     = aws_lambda_function.chatgpt_handler.invoke_arn
}

# Lambda function
resource "aws_lambda_function" "chatgpt_handler" {
  filename         = "lambda-deployment.zip"
  function_name    = "chatgpt-conversation-handler"
  role             = aws_iam_role.lambda_exec.arn
  handler          = "index.handler"
  runtime          = "nodejs20.x"
  timeout          = 30 # 30 seconds (adjust for long ChatGPT responses)
  memory_size      = 512 # MB (increase if using large dependencies)

  environment {
    variables = {
      OPENAI_API_KEY     = var.openai_api_key # Store in AWS Secrets Manager
      CONVERSATIONS_TABLE = aws_dynamodb_table.conversations.name
      OPENAI_MODEL       = "gpt-4-turbo-preview"
      MAX_TOKENS         = "500"
      TEMPERATURE        = "0.7"
    }
  }

  # VPC configuration (optional, for private DynamoDB access)
  # vpc_config {
  #   subnet_ids         = var.private_subnet_ids
  #   security_group_ids = [aws_security_group.lambda_sg.id]
  # }
}

# Lambda execution role
resource "aws_iam_role" "lambda_exec" {
  name = "chatgpt-lambda-exec-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}

# Attach CloudWatch Logs policy
resource "aws_iam_role_policy_attachment" "lambda_logs" {
  role       = aws_iam_role.lambda_exec.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

# DynamoDB access policy
resource "aws_iam_role_policy" "dynamodb_access" {
  name = "chatgpt-lambda-dynamodb-policy"
  role = aws_iam_role.lambda_exec.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "dynamodb:Query",
        "dynamodb:PutItem",
        "dynamodb:GetItem"
      ]
      Resource = aws_dynamodb_table.conversations.arn
    }]
  })
}

# API Gateway permission to invoke Lambda
resource "aws_lambda_permission" "apigw_invoke" {
  statement_id  = "AllowAPIGatewayInvoke"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.chatgpt_handler.function_name
  principal     = "apigateway.amazonaws.com"
  source_arn    = "${aws_api_gateway_rest_api.chatgpt_api.execution_arn}/*/*"
}

# API Gateway deployment
resource "aws_api_gateway_deployment" "deployment" {
  rest_api_id = aws_api_gateway_rest_api.chatgpt_api.id
  stage_name  = "prod"

  depends_on = [
    aws_api_gateway_integration.lambda_integration
  ]
}

# DynamoDB table for conversations
resource "aws_dynamodb_table" "conversations" {
  name           = "chatgpt-conversations"
  billing_mode   = "PAY_PER_REQUEST" # On-demand pricing
  hash_key       = "conversationId"
  range_key      = "timestamp"

  attribute {
    name = "conversationId"
    type = "S"
  }

  attribute {
    name = "timestamp"
    type = "N"
  }

  ttl {
    attribute_name = "ttl"
    enabled        = true
  }

  tags = {
    Environment = "production"
    Application = "chatgpt-serverless"
  }
}

# Outputs
output "api_endpoint" {
  value = "${aws_api_gateway_deployment.deployment.invoke_url}/chat"
}

output "lambda_function_name" {
  value = aws_lambda_function.chatgpt_handler.function_name
}

Lambda Layers for Shared Dependencies

Lambda layers reduce deployment package sizes by sharing common dependencies (OpenAI SDK, AWS SDK) across functions:

// lambda-layers/openai-layer/nodejs/package.json
{
  "name": "openai-layer",
  "version": "1.0.0",
  "description": "Shared OpenAI SDK for Lambda functions",
  "dependencies": {
    "openai": "^4.20.0"
  }
}

// Build layer:
// cd lambda-layers/openai-layer/nodejs && npm install
// cd .. && zip -r openai-layer.zip nodejs/
// aws lambda publish-layer-version \
//   --layer-name openai-sdk-layer \
//   --zip-file fileb://openai-layer.zip \
//   --compatible-runtimes nodejs20.x

// Attach layer to Lambda function (Terraform)
resource "aws_lambda_function" "chatgpt_handler" {
  # ... (previous config)

  layers = [
    "arn:aws:lambda:us-east-1:123456789012:layer:openai-sdk-layer:1"
  ]
}

// In function code, import from layer:
// const OpenAI = require('openai'); // Loaded from layer, not deployment package

Google Cloud Functions for ChatGPT Apps

Google Cloud Functions offers similar serverless capabilities with tighter integration to Google Cloud ecosystem (Firestore, Pub/Sub, Cloud Run). Here's a production TypeScript implementation.

HTTP Cloud Function (TypeScript)

// functions/src/chatgpt-handler.ts
// Google Cloud Function (HTTP) for ChatGPT conversations
// Features: Firestore storage, OpenAI streaming, error handling

import { https } from 'firebase-functions/v2';
import { Firestore } from '@google-cloud/firestore';
import OpenAI from 'openai';

// Initialize Firestore and OpenAI clients
const firestore = new Firestore();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const MODEL = process.env.OPENAI_MODEL || 'gpt-4-turbo-preview';
const MAX_TOKENS = parseInt(process.env.MAX_TOKENS || '500');
const TEMPERATURE = parseFloat(process.env.TEMPERATURE || '0.7');

interface ChatRequest {
  userId: string;
  message: string;
  conversationId?: string;
  systemPrompt?: string;
}

export const chatgptHandler = https.onRequest(
  {
    region: 'us-central1',
    memory: '512MiB',
    timeoutSeconds: 60,
    maxInstances: 100, // Auto-scale to 100 instances
    cors: ['https://yourdomain.com', 'http://localhost:3000'],
    secrets: ['OPENAI_API_KEY'] // Load from Secret Manager
  },
  async (req, res) => {
    console.log('Received ChatGPT request:', req.body);

    try {
      // Validate request method
      if (req.method !== 'POST') {
        res.status(405).json({ error: 'Method not allowed. Use POST.' });
        return;
      }

      // Parse request body
      const { userId, message, conversationId, systemPrompt }: ChatRequest = req.body;

      // Validation
      if (!userId || !message) {
        res.status(400).json({ error: 'userId and message are required' });
        return;
      }

      // Generate conversation ID if not provided
      const convId = conversationId || `conv_${Date.now()}_${userId}`;

      // Retrieve conversation history from Firestore
      const conversationsRef = firestore.collection('conversations');
      const historySnapshot = await conversationsRef
        .where('conversationId', '==', convId)
        .orderBy('timestamp', 'asc')
        .limit(20) // Last 20 messages
        .get();

      // Build messages array for OpenAI
      const messages: OpenAI.ChatCompletionMessageParam[] = [
        {
          role: 'system',
          content: systemPrompt || 'You are a helpful AI assistant for a ChatGPT app.'
        }
      ];

      // Add conversation history
      historySnapshot.forEach(doc => {
        const data = doc.data();
        messages.push({ role: data.role, content: data.content });
      });

      // Add current user message
      messages.push({ role: 'user', content: message });

      // Call OpenAI API
      const startTime = Date.now();
      const completion = await openai.chat.completions.create({
        model: MODEL,
        messages: messages,
        max_tokens: MAX_TOKENS,
        temperature: TEMPERATURE,
        stream: false
      });

      const assistantMessage = completion.choices[0].message.content;
      const responseTime = Date.now() - startTime;

      console.log(`OpenAI response received in ${responseTime}ms`);

      // Store user message in Firestore
      await conversationsRef.add({
        conversationId: convId,
        timestamp: Firestore.Timestamp.now(),
        messageId: `msg_${Date.now()}_user`,
        userId: userId,
        role: 'user',
        content: message,
        createdAt: Firestore.FieldValue.serverTimestamp()
      });

      // Store assistant response in Firestore
      await conversationsRef.add({
        conversationId: convId,
        timestamp: Firestore.Timestamp.fromMillis(Date.now() + 1),
        messageId: `msg_${Date.now()}_assistant`,
        userId: userId,
        role: 'assistant',
        content: assistantMessage,
        model: MODEL,
        tokensUsed: completion.usage?.total_tokens || 0,
        responseTime: responseTime,
        createdAt: Firestore.FieldValue.serverTimestamp()
      });

      // Return successful response
      res.status(200).json({
        conversationId: convId,
        message: assistantMessage,
        tokensUsed: completion.usage?.total_tokens || 0,
        responseTime: responseTime,
        model: MODEL
      });

    } catch (error: any) {
      console.error('Error processing ChatGPT request:', error);

      // Handle specific OpenAI errors
      if (error.status === 429) {
        res.status(429).json({ error: 'Rate limit exceeded. Please try again later.' });
        return;
      }

      if (error.status === 401) {
        res.status(500).json({ error: 'OpenAI API key invalid. Contact support.' });
        return;
      }

      // Generic error response
      res.status(500).json({
        error: 'Internal server error processing conversation',
        requestId: req.headers['x-cloud-trace-context']
      });
    }
  }
);

Pub/Sub Trigger for Async Processing

For long-running ChatGPT tasks (document analysis, batch processing), use Pub/Sub triggers:

// functions/src/async-chatgpt-processor.ts
// Cloud Function triggered by Pub/Sub for async ChatGPT processing
// Use case: Batch processing, document analysis, complex workflows

import { CloudEvent } from 'firebase-functions/v2';
import { MessagePublishedData, onMessagePublished } from 'firebase-functions/v2/pubsub';
import { Firestore } from '@google-cloud/firestore';
import OpenAI from 'openai';

const firestore = new Firestore();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

interface ChatTask {
  taskId: string;
  userId: string;
  prompt: string;
  documentUrl?: string;
  callbackUrl?: string;
}

export const asyncChatProcessor = onMessagePublished(
  {
    topic: 'chatgpt-tasks',
    region: 'us-central1',
    memory: '1GiB',
    timeoutSeconds: 300, // 5 minutes for long tasks
    secrets: ['OPENAI_API_KEY']
  },
  async (event: CloudEvent<MessagePublishedData>) => {
    console.log('Processing Pub/Sub message:', event.id);

    try {
      // Decode Pub/Sub message data (base64)
      const messageData = event.data.message.data;
      const decodedData = Buffer.from(messageData, 'base64').toString('utf-8');
      const task: ChatTask = JSON.parse(decodedData);

      console.log('Task details:', task);

      // Update task status to "processing"
      await firestore.collection('chatgpt-tasks').doc(task.taskId).update({
        status: 'processing',
        startedAt: Firestore.FieldValue.serverTimestamp()
      });

      // Call OpenAI API (could be multi-step workflow)
      const completion = await openai.chat.completions.create({
        model: 'gpt-4-turbo-preview',
        messages: [
          { role: 'system', content: 'You are a document analysis assistant.' },
          { role: 'user', content: task.prompt }
        ],
        max_tokens: 2000,
        temperature: 0.5
      });

      const result = completion.choices[0].message.content;

      // Store result in Firestore
      await firestore.collection('chatgpt-tasks').doc(task.taskId).update({
        status: 'completed',
        result: result,
        tokensUsed: completion.usage?.total_tokens || 0,
        completedAt: Firestore.FieldValue.serverTimestamp()
      });

      // Optional: Send callback webhook
      if (task.callbackUrl) {
        await fetch(task.callbackUrl, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({ taskId: task.taskId, result: result })
        });
      }

      console.log(`Task ${task.taskId} completed successfully`);

    } catch (error: any) {
      console.error('Error processing async task:', error);

      // Update task status to "failed"
      const messageData = event.data.message.data;
      const decodedData = Buffer.from(messageData, 'base64').toString('utf-8');
      const task: ChatTask = JSON.parse(decodedData);

      await firestore.collection('chatgpt-tasks').doc(task.taskId).update({
        status: 'failed',
        error: error.message,
        failedAt: Firestore.FieldValue.serverTimestamp()
      });
    }
  }
);

Cloud Run Deployment (YAML)

For advanced use cases requiring containers (custom dependencies, long-running connections):

# cloudrun-chatgpt.yaml
# Cloud Run service for ChatGPT apps (alternative to Cloud Functions)
# Use case: WebSocket connections, custom runtimes, > 9 minutes execution

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: chatgpt-api
  namespace: default
  labels:
    cloud.googleapis.com/location: us-central1
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: '1' # Keep 1 instance warm (avoid cold starts)
        autoscaling.knative.dev/maxScale: '100' # Scale to 100 instances
        run.googleapis.com/cpu-throttling: 'false' # Always-on CPU for WebSocket
        run.googleapis.com/execution-environment: gen2
    spec:
      containerConcurrency: 80 # Handle 80 concurrent requests per container
      timeoutSeconds: 300 # 5 minutes timeout
      serviceAccountName: chatgpt-service-account@project-id.iam.gserviceaccount.com

      containers:
      - name: chatgpt-container
        image: gcr.io/project-id/chatgpt-api:latest
        ports:
        - containerPort: 8080
          name: http1

        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: openai-api-key
              key: latest
        - name: OPENAI_MODEL
          value: gpt-4-turbo-preview
        - name: MAX_TOKENS
          value: '500'
        - name: TEMPERATURE
          value: '0.7'
        - name: FIRESTORE_PROJECT_ID
          value: project-id

        resources:
          limits:
            memory: 1Gi
            cpu: '2'
          requests:
            memory: 512Mi
            cpu: '1'

        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10

        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

  traffic:
  - percent: 100
    latestRevision: true

---
# Deploy with:
# gcloud run services replace cloudrun-chatgpt.yaml --region=us-central1
#
# Benefits over Cloud Functions:
# - WebSocket support (real-time streaming)
# - Container flexibility (any language/runtime)
# - Longer execution (up to 60 minutes)
# - Custom health checks
# - Gradual rollouts (traffic splitting)

For scalable ChatGPT deployments, see Scalable ChatGPT App Architecture.


Orchestration with AWS Step Functions

Complex ChatGPT workflows (multi-step reasoning, moderation → generation → storage → notification) benefit from Step Functions orchestration. This visual state machine coordinates Lambda functions with built-in error handling and retries.

Step Functions State Machine (JSON)

{
  "Comment": "ChatGPT conversation workflow with moderation, generation, storage, notification",
  "StartAt": "ModerateUserMessage",
  "States": {
    "ModerateUserMessage": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:chatgpt-moderation",
      "Comment": "Check user message for policy violations (OpenAI Moderation API)",
      "TimeoutSeconds": 10,
      "Retry": [
        {
          "ErrorEquals": ["States.TaskFailed", "Lambda.ServiceException"],
          "IntervalSeconds": 2,
          "MaxAttempts": 3,
          "BackoffRate": 2.0
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["ModerationFailed"],
          "ResultPath": "$.error",
          "Next": "SendModerationAlert"
        }
      ],
      "Next": "CheckModerationResult"
    },

    "CheckModerationResult": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.moderation.flagged",
          "BooleanEquals": true,
          "Next": "SendModerationAlert"
        }
      ],
      "Default": "GenerateChatGPTResponse"
    },

    "GenerateChatGPTResponse": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:chatgpt-generator",
      "Comment": "Call OpenAI ChatGPT API with conversation history",
      "TimeoutSeconds": 30,
      "Retry": [
        {
          "ErrorEquals": ["OpenAIRateLimitError"],
          "IntervalSeconds": 5,
          "MaxAttempts": 5,
          "BackoffRate": 2.0
        },
        {
          "ErrorEquals": ["States.TaskFailed"],
          "IntervalSeconds": 2,
          "MaxAttempts": 2,
          "BackoffRate": 1.5
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "ResultPath": "$.error",
          "Next": "HandleGenerationError"
        }
      ],
      "Next": "ParallelProcessing"
    },

    "ParallelProcessing": {
      "Type": "Parallel",
      "Comment": "Store conversation and send notification in parallel",
      "Branches": [
        {
          "StartAt": "StoreConversation",
          "States": {
            "StoreConversation": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:123456789012:function:chatgpt-storage",
              "Comment": "Save conversation to DynamoDB",
              "TimeoutSeconds": 5,
              "End": true
            }
          }
        },
        {
          "StartAt": "SendUserNotification",
          "States": {
            "SendUserNotification": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:123456789012:function:send-notification",
              "Comment": "Send email/SMS notification (optional)",
              "TimeoutSeconds": 5,
              "End": true
            }
          }
        },
        {
          "StartAt": "UpdateAnalytics",
          "States": {
            "UpdateAnalytics": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:123456789012:function:update-analytics",
              "Comment": "Log conversation metrics (CloudWatch, Datadog)",
              "TimeoutSeconds": 3,
              "End": true
            }
          }
        }
      ],
      "Next": "SuccessResponse"
    },

    "SuccessResponse": {
      "Type": "Succeed",
      "Comment": "Conversation processed successfully"
    },

    "SendModerationAlert": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:send-alert",
      "Comment": "Alert admin about policy violation",
      "TimeoutSeconds": 5,
      "Next": "ModerationFailure"
    },

    "ModerationFailure": {
      "Type": "Fail",
      "Cause": "User message violated content policy",
      "Error": "ModerationFailed"
    },

    "HandleGenerationError": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123456789012:function:error-handler",
      "Comment": "Log error, notify user",
      "TimeoutSeconds": 5,
      "Next": "GenerationFailure"
    },

    "GenerationFailure": {
      "Type": "Fail",
      "Cause": "Failed to generate ChatGPT response",
      "Error": "GenerationFailed"
    }
  }
}

Lambda Integration with Step Functions

// lambda/chatgpt-moderation/index.ts
// Step Functions Lambda: Moderation check using OpenAI Moderation API

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

interface ModerationInput {
  userId: string;
  message: string;
  conversationId: string;
}

export const handler = async (event: ModerationInput) => {
  console.log('Moderating message:', event);

  try {
    // Call OpenAI Moderation API
    const moderation = await openai.moderations.create({
      input: event.message
    });

    const result = moderation.results[0];

    console.log('Moderation result:', result);

    // Return moderation result to Step Functions
    return {
      ...event,
      moderation: {
        flagged: result.flagged,
        categories: result.categories,
        categoryScores: result.category_scores
      }
    };

  } catch (error: any) {
    console.error('Moderation error:', error);
    throw new Error('ModerationFailed');
  }
};

// Step Functions input/output example:
// Input:  { "userId": "user123", "message": "Hello", "conversationId": "conv456" }
// Output: { "userId": "user123", "message": "Hello", "conversationId": "conv456",
//           "moderation": { "flagged": false, "categories": {...}, "categoryScores": {...} } }

Error Handling with Retries

Step Functions provides built-in retry logic for transient failures:

{
  "Retry": [
    {
      "ErrorEquals": ["OpenAIRateLimitError"],
      "IntervalSeconds": 5,
      "MaxAttempts": 5,
      "BackoffRate": 2.0
    }
  ],
  "Catch": [
    {
      "ErrorEquals": ["States.ALL"],
      "ResultPath": "$.error",
      "Next": "HandleGenerationError"
    }
  ]
}

Retry strategy:

  • OpenAI rate limits (429): Exponential backoff (5s, 10s, 20s, 40s, 80s)
  • Network errors: 2 retries with 2x backoff
  • Catch-all errors: Route to error handler Lambda

Cold Start Optimization Strategies

Cold starts (300ms-3s delay when Lambda initializes) disrupt real-time ChatGPT conversations. Production strategies to minimize impact:

1. Provisioned Concurrency (Terraform)

Keep Lambda instances warm at all times:

# terraform/provisioned-concurrency.tf
# Provisioned concurrency to eliminate cold starts
# Cost: ~$0.015/hour per provisioned instance (~$11/month)

resource "aws_lambda_provisioned_concurrency_config" "chatgpt_handler" {
  function_name                     = aws_lambda_function.chatgpt_handler.function_name
  provisioned_concurrent_executions = 5 # Keep 5 instances always warm
  qualifier                         = aws_lambda_alias.prod.name
}

resource "aws_lambda_alias" "prod" {
  name             = "prod"
  function_name    = aws_lambda_function.chatgpt_handler.function_name
  function_version = aws_lambda_function.chatgpt_handler.version
}

# Auto-scaling provisioned concurrency (scale 2-20 instances based on load)
resource "aws_appautoscaling_target" "lambda_concurrency" {
  max_capacity       = 20
  min_capacity       = 2
  resource_id        = "function:${aws_lambda_function.chatgpt_handler.function_name}:provisioned-concurrency:${aws_lambda_alias.prod.name}"
  scalable_dimension = "lambda:function:ProvisionedConcurrentExecutions"
  service_namespace  = "lambda"
}

resource "aws_appautoscaling_policy" "lambda_concurrency_policy" {
  name               = "chatgpt-concurrency-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.lambda_concurrency.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda_concurrency.scalable_dimension
  service_namespace  = aws_appautoscaling_target.lambda_concurrency.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value = 0.70 # Scale when 70% utilization
    predefined_metric_specification {
      predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
    }
  }
}

2. Function Warmer (Scheduled EventBridge)

Ping Lambda every 5 minutes to keep container warm:

// lambda/function-warmer/index.ts
// EventBridge scheduled rule to keep Lambda warm
// Runs every 5 minutes to prevent cold starts

import { LambdaClient, InvokeCommand } from '@aws-sdk/client-lambda';

const lambda = new LambdaClient({ region: process.env.AWS_REGION });

const FUNCTIONS_TO_WARM = [
  'chatgpt-conversation-handler',
  'chatgpt-moderation',
  'chatgpt-generator'
];

export const handler = async () => {
  console.log('Warming Lambda functions...');

  const promises = FUNCTIONS_TO_WARM.map(async (functionName) => {
    try {
      const command = new InvokeCommand({
        FunctionName: functionName,
        InvocationType: 'Event', // Async invocation (don't wait for response)
        Payload: JSON.stringify({ warmer: true })
      });

      await lambda.send(command);
      console.log(`Warmed function: ${functionName}`);
    } catch (error) {
      console.error(`Failed to warm ${functionName}:`, error);
    }
  });

  await Promise.all(promises);

  return { statusCode: 200, body: 'Functions warmed successfully' };
};

// Terraform EventBridge rule:
// resource "aws_cloudwatch_event_rule" "lambda_warmer" {
//   name                = "chatgpt-lambda-warmer"
//   description         = "Keep ChatGPT Lambda functions warm"
//   schedule_expression = "rate(5 minutes)"
// }

Cold start optimization checklist:

  • ✅ Use provisioned concurrency for critical functions (2-5 instances)
  • ✅ Implement function warmer for scheduled pings (5-minute intervals)
  • ✅ Reduce deployment package size (<10MB using esbuild/webpack)
  • ✅ Lazy load heavy dependencies (OpenAI SDK, LangChain)
  • ✅ Use Lambda layers for shared dependencies
  • ✅ Increase memory allocation (faster CPU, faster initialization)

Production Deployment Checklist

Before deploying serverless ChatGPT apps to production:

Infrastructure:

  • ✅ API Gateway with custom domain + SSL certificate
  • ✅ Lambda functions with IAM roles (least privilege)
  • ✅ DynamoDB/Firestore with TTL for automatic cleanup
  • ✅ CloudWatch/Cloud Logging for monitoring
  • ✅ Secrets Manager for API keys (NEVER environment variables)
  • ✅ VPC configuration for private resource access (optional)

Performance:

  • ✅ Provisioned concurrency (2-5 instances) or function warmer
  • ✅ Lambda timeout: 30 seconds minimum (ChatGPT can take 10-15s)
  • ✅ Memory: 512MB+ (faster CPU for OpenAI SDK)
  • ✅ Connection pooling for database clients

Error Handling:

  • ✅ Retry logic with exponential backoff (rate limits, network errors)
  • ✅ Dead letter queues (DLQ) for failed invocations
  • ✅ Circuit breakers for OpenAI API failures
  • ✅ User-friendly error messages (hide internal errors)

Security:

  • ✅ API Gateway authentication (Cognito, API keys, IAM)
  • ✅ Rate limiting (10 requests/minute per user)
  • ✅ Input validation (prevent prompt injection)
  • ✅ Moderation API for content filtering
  • ✅ CORS configuration (restrict domains)

Observability:

  • ✅ Structured logging (JSON format with request IDs)
  • ✅ CloudWatch/Cloud Monitoring dashboards
  • ✅ Custom metrics (conversation count, token usage, latency)
  • ✅ X-Ray/Cloud Trace for distributed tracing
  • ✅ Cost alerts (budget $100/month for 10K conversations)

Conclusion: Build Scalable ChatGPT Apps with Serverless

Serverless architecture eliminates infrastructure management while providing unmatched scalability for ChatGPT applications. AWS Lambda, Google Cloud Functions, and Azure Functions enable you to:

  • Auto-scale from 0 to 1M conversations without manual intervention
  • Pay only for execution time ($0.20 per 1M requests + compute time)
  • Deploy globally with multi-region replication in minutes
  • Focus on conversation logic instead of server maintenance

The patterns in this guide—API Gateway + Lambda, Pub/Sub triggers, Step Functions orchestration, and cold start optimization—are battle-tested in production ChatGPT applications serving millions of users. Start with the basic Lambda + DynamoDB pattern, then layer in Step Functions for complex workflows and provisioned concurrency for low latency.

Ready to build serverless ChatGPT apps without managing infrastructure? Start your free trial and deploy your first serverless ChatGPT app to AWS Lambda in under 48 hours—no DevOps expertise required.

For comprehensive ChatGPT development guidance, explore our ChatGPT Applications Development Guide.


Related Resources

Pillar Content:

  • ChatGPT Applications Development Guide

Cluster Articles:

Landing Pages:

  • Build Scalable ChatGPT Apps

External Resources:


About MakeAIHQ: We help businesses build production-ready ChatGPT applications with serverless architectures that scale automatically. From API Gateway + Lambda to Step Functions orchestration, our platform generates battle-tested serverless infrastructure in minutes—no DevOps expertise required.

Start Building Your Serverless ChatGPT App →