ELK Stack Log Aggregation for ChatGPT Apps
Managing logs across distributed ChatGPT applications becomes exponentially complex as your deployment scales. When you're running multiple MCP servers, handling thousands of tool calls per minute, and debugging real-time conversation flows, traditional log files scattered across containers quickly become unmanageable. You need centralized log aggregation that provides real-time search, pattern recognition, and visual analytics.
The ELK Stack (Elasticsearch, Logstash, Kibana) has become the industry-standard solution for log aggregation and analysis at scale. This powerful combination enables you to collect logs from all your ChatGPT app components—MCP servers, widget runtime, authentication services, and backend APIs—into a centralized, searchable index with real-time dashboards.
In this comprehensive guide, you'll learn how to deploy a production-ready ELK Stack for ChatGPT application log aggregation. We'll cover the complete architecture, Docker Compose setup, Logstash pipeline configuration, Kibana dashboard creation, and production deployment strategies with security best practices.
For the complete ChatGPT development workflow, see our Complete Guide to Building ChatGPT Applications. If you want to skip the infrastructure complexity and focus on building your app, MakeAIHQ provides managed logging and monitoring out of the box.
ELK Stack Architecture for ChatGPT Apps
The ELK Stack consists of four core components working together to create a complete log aggregation pipeline. Understanding each component's role is essential for designing a reliable logging infrastructure.
Elasticsearch: The Search Engine
Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It stores your log data in indices (similar to databases) and provides near-real-time search capabilities across billions of log entries.
For ChatGPT applications, Elasticsearch indexes contain structured log documents with fields like:
timestamp: When the event occurredlog_level: DEBUG, INFO, WARN, ERROR, CRITICALservice_name: Which MCP server or component generated the logtool_name: Which ChatGPT tool was invokeduser_id: Which user triggered the event (when authenticated)message: The actual log messagestack_trace: Error stack traces for debuggingresponse_time: Performance metrics for tool calls
Elasticsearch automatically creates inverted indices for full-text search, allowing you to find logs like "all ERROR logs from the restaurant-booking MCP server in the last 24 hours where response_time > 5000ms" in milliseconds.
Logstash: The Data Pipeline
Logstash is a server-side data processing pipeline that ingests logs from multiple sources, transforms and enriches them, and sends them to Elasticsearch. It operates in three stages:
- Input plugins: Collect logs from files, HTTP endpoints, message queues, databases
- Filter plugins: Parse, transform, enrich log data (grok patterns, JSON parsing, GeoIP lookup)
- Output plugins: Send processed logs to Elasticsearch, S3, monitoring systems
For ChatGPT apps, Logstash pipelines typically:
- Parse JSON-formatted logs from containerized MCP servers
- Extract structured fields from unstructured log messages using grok patterns
- Add metadata like environment (
production,staging), region, deployment version - Calculate derived metrics (request duration, token usage, error rates)
- Route logs to different Elasticsearch indices based on log level or service
Kibana: The Visualization Layer
Kibana is the web-based UI for visualizing Elasticsearch data. It provides:
- Discover: Full-text search interface for exploring logs
- Visualizations: Charts, graphs, maps, tables for log analytics
- Dashboards: Pre-built collections of visualizations for monitoring
- Canvas: Pixel-perfect infographic-style reports
For ChatGPT applications, Kibana dashboards typically show:
- Real-time request volume by tool name
- Error rate trends over time
- P50/P95/P99 latency percentiles
- Top 10 slowest tools
- Geographic distribution of users (from IP addresses)
- Alert thresholds (e.g., error rate > 5%)
Filebeat: The Lightweight Shipper
Filebeat is a lightweight agent that ships log files from your application servers to Logstash or Elasticsearch. Unlike Logstash (which is resource-intensive), Filebeat is designed to run on every server with minimal overhead.
For Docker-based ChatGPT deployments, Filebeat:
- Mounts the Docker socket to collect container logs
- Tails log files in real-time
- Adds metadata like container name, labels, environment variables
- Handles backpressure when Logstash is overloaded
- Guarantees at-least-once delivery with persistent state
The typical data flow is: ChatGPT App → Filebeat → Logstash → Elasticsearch → Kibana.
Learn more about logging best practices in our guide: MCP Server Logging Best Practices for ChatGPT.
Production Docker Compose Setup
Deploying the ELK Stack in Docker provides consistency across development, staging, and production environments. This Docker Compose configuration creates a production-ready cluster with proper networking, volumes, and security.
# docker-compose.elk.yml
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.3
container_name: chatgpt-elasticsearch
environment:
# Cluster configuration
- cluster.name=chatgpt-logs-cluster
- node.name=chatgpt-es-node-01
- discovery.type=single-node
# Memory configuration (CRITICAL for production)
- ES_JAVA_OPTS=-Xms4g -Xmx4g
- bootstrap.memory_lock=true
# Security configuration
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=false
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
# Performance tuning
- indices.memory.index_buffer_size=30%
- thread_pool.write.queue_size=1000
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro
ports:
- "9200:9200"
- "9300:9300"
networks:
- elk
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:9200/_cluster/health || exit 1"]
interval: 30s
timeout: 10s
retries: 5
restart: unless-stopped
logstash:
image: docker.elastic.co/logstash/logstash:8.11.3
container_name: chatgpt-logstash
environment:
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
- XPACK_MONITORING_ENABLED=true
- XPACK_MONITORING_ELASTICSEARCH_HOSTS=http://elasticsearch:9200
- LS_JAVA_OPTS=-Xmx2g -Xms2g
volumes:
- ./logstash/config/logstash.yml:/usr/share/logstash/config/logstash.yml:ro
- ./logstash/pipeline:/usr/share/logstash/pipeline:ro
- ./logstash/patterns:/usr/share/logstash/patterns:ro
ports:
- "5044:5044" # Beats input
- "9600:9600" # Logstash monitoring API
networks:
- elk
depends_on:
elasticsearch:
condition: service_healthy
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:9600/_node/stats || exit 1"]
interval: 30s
timeout: 10s
retries: 5
restart: unless-stopped
kibana:
image: docker.elastic.co/kibana/kibana:8.11.3
container_name: chatgpt-kibana
environment:
- SERVERNAME=chatgpt-kibana
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
- ELASTICSEARCH_USERNAME=kibana_system
- ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
- XPACK_SECURITY_ENABLED=true
- XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY=${KIBANA_ENCRYPTION_KEY}
volumes:
- ./kibana/config/kibana.yml:/usr/share/kibana/config/kibana.yml:ro
- kibana-data:/usr/share/kibana/data
ports:
- "5601:5601"
networks:
- elk
depends_on:
elasticsearch:
condition: service_healthy
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:5601/api/status || exit 1"]
interval: 30s
timeout: 10s
retries: 5
restart: unless-stopped
filebeat:
image: docker.elastic.co/beats/filebeat:8.11.3
container_name: chatgpt-filebeat
user: root
environment:
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
volumes:
- ./filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- filebeat-data:/usr/share/filebeat/data
networks:
- elk
depends_on:
logstash:
condition: service_healthy
command: filebeat -e -strict.perms=false
restart: unless-stopped
volumes:
elasticsearch-data:
driver: local
kibana-data:
driver: local
filebeat-data:
driver: local
networks:
elk:
driver: bridge
Critical production considerations:
Memory allocation: Elasticsearch requires
-Xmsand-Xmxto be equal (prevents heap resizing). Allocate 50% of available RAM (max 32GB due to compressed pointers).Volume persistence: Named volumes ensure data survives container restarts. For production, use block storage (AWS EBS, GCP Persistent Disk).
Health checks: Ensure services start in the correct order (Elasticsearch → Logstash → Kibana → Filebeat).
Security: Use environment variables for passwords. Generate strong keys with
openssl rand -hex 32.
Logstash Pipeline Configuration
Logstash pipelines define how logs flow from inputs through filters to outputs. This production-ready pipeline handles ChatGPT application logs with JSON parsing, field extraction, and enrichment.
# logstash/pipeline/chatgpt-app.conf
input {
# Beats input (receives logs from Filebeat)
beats {
port => 5044
codec => json
}
# HTTP input (for direct log shipping from apps)
http {
port => 8080
codec => json
additional_codecs => {
"application/json" => "json"
}
}
}
filter {
# Parse JSON logs from MCP servers
if [message] =~ /^\{.*\}$/ {
json {
source => "message"
target => "parsed"
}
# Promote parsed fields to top level
if [parsed] {
mutate {
rename => {
"[parsed][level]" => "log_level"
"[parsed][timestamp]" => "log_timestamp"
"[parsed][service]" => "service_name"
"[parsed][tool]" => "tool_name"
"[parsed][user_id]" => "user_id"
"[parsed][duration_ms]" => "response_time"
"[parsed][error]" => "error_message"
"[parsed][stack]" => "stack_trace"
}
}
}
}
# Parse unstructured logs with grok patterns
if ![log_level] {
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:log_timestamp} %{LOGLEVEL:log_level} \[%{DATA:service_name}\] %{GREEDYDATA:log_message}"
}
patterns_dir => ["/usr/share/logstash/patterns"]
}
}
# Convert timestamps to @timestamp field
if [log_timestamp] {
date {
match => ["log_timestamp", "ISO8601", "yyyy-MM-dd'T'HH:mm:ss.SSSZ"]
target => "@timestamp"
remove_field => ["log_timestamp"]
}
}
# Normalize log levels
mutate {
uppercase => ["log_level"]
}
# Add environment metadata
mutate {
add_field => {
"environment" => "${ENVIRONMENT:production}"
"region" => "${AWS_REGION:us-east-1}"
"deployment_version" => "${DEPLOYMENT_VERSION:unknown}"
}
}
# Parse user agent strings
if [http_user_agent] {
useragent {
source => "http_user_agent"
target => "user_agent"
}
}
# GeoIP lookup for client IPs
if [client_ip] {
geoip {
source => "client_ip"
target => "geoip"
fields => ["city_name", "country_name", "location"]
}
}
# Calculate derived metrics
if [response_time] {
ruby {
code => "
response_time = event.get('response_time').to_f
event.set('response_time_category',
case response_time
when 0..100 then 'fast'
when 101..500 then 'normal'
when 501..2000 then 'slow'
else 'very_slow'
end
)
"
}
}
# Tag errors for alerting
if [log_level] == "ERROR" or [log_level] == "CRITICAL" {
mutate {
add_tag => ["error_log"]
}
}
# Remove unnecessary fields
mutate {
remove_field => ["host", "agent", "ecs", "input", "parsed"]
}
}
output {
# Primary output: Elasticsearch
elasticsearch {
hosts => ["http://elasticsearch:9200"]
user => "elastic"
password => "${ELASTIC_PASSWORD}"
# Dynamic index routing by date
index => "chatgpt-logs-%{+YYYY.MM.dd}"
# Document ID (prevents duplicates)
document_id => "%{[@metadata][fingerprint]}"
# ILM policy (Index Lifecycle Management)
ilm_enabled => true
ilm_rollover_alias => "chatgpt-logs"
ilm_pattern => "{now/d}-000001"
ilm_policy => "chatgpt-logs-policy"
}
# Error output: Separate index for ERROR/CRITICAL logs
if "error_log" in [tags] {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
user => "elastic"
password => "${ELASTIC_PASSWORD}"
index => "chatgpt-errors-%{+YYYY.MM.dd}"
}
}
# Debugging output (only in non-production)
if "${ENVIRONMENT:production}" != "production" {
stdout {
codec => rubydebug
}
}
}
Pipeline highlights:
- Dual input: Accepts logs from Filebeat (port 5044) and direct HTTP (port 8080)
- JSON parsing: Extracts structured fields from JSON logs
- Grok patterns: Parses unstructured logs when JSON isn't available
- Enrichment: Adds GeoIP, user agent parsing, environment metadata
- Dynamic indexing: Creates daily indices (
chatgpt-logs-2026-12-25) - Error routing: Sends ERROR/CRITICAL logs to a separate index for faster alerting
For advanced log analysis techniques, see our guide: Log Analysis with Kibana for ChatGPT.
Custom Grok Patterns for ChatGPT Logs
Grok patterns enable you to parse unstructured log messages into structured fields. These custom patterns handle common ChatGPT application log formats.
# logstash/patterns/chatgpt-patterns.txt
# MCP Server log pattern
# Example: 2026-12-25T14:32:18.456Z INFO [restaurant-booking] Tool call: create_reservation user=user_123 duration=234ms
MCP_LOG %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} \[%{DATA:service}\] Tool call: %{DATA:tool} user=%{DATA:user_id} duration=%{NUMBER:duration_ms}ms
# Widget runtime pattern
# Example: [2026-12-25 14:32:18] WARN Widget timeout: MapWidget component=InteractiveMap timeout=5000ms
WIDGET_LOG \[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:level} Widget %{DATA:event_type}: %{DATA:widget_name} component=%{DATA:component_name} timeout=%{NUMBER:timeout_ms}ms
# Authentication log pattern
# Example: 2026-12-25T14:32:18Z INFO [auth-service] OAuth token verified: user_id=user_123 scope=read_profile,write_apps ip=203.0.113.42
AUTH_LOG %{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} \[auth-service\] %{DATA:auth_event}: user_id=%{DATA:user_id} scope=%{DATA:scopes} ip=%{IP:client_ip}
# Error with stack trace pattern
# Example: 2026-12-25T14:32:18Z ERROR [mcp-server] UnhandledPromiseRejection: Connection timeout
ERROR_LOG %{TIMESTAMP_ISO8601:timestamp} ERROR \[%{DATA:service}\] %{DATA:error_type}: %{GREEDYDATA:error_message}
# Performance metric pattern
# Example: METRIC tool_call_duration_ms=234 service=restaurant-booking tool=create_reservation percentile=p95
METRIC_LOG METRIC %{DATA:metric_name}=%{NUMBER:metric_value} service=%{DATA:service} tool=%{DATA:tool} percentile=%{DATA:percentile}
Usage in pipeline:
filter {
grok {
match => {
"message" => [
"%{MCP_LOG}",
"%{WIDGET_LOG}",
"%{AUTH_LOG}",
"%{ERROR_LOG}",
"%{METRIC_LOG}"
]
}
patterns_dir => ["/usr/share/logstash/patterns"]
}
}
Grok debugger tool: Use Kibana's Dev Tools → Grok Debugger to test patterns against real log samples.
Filebeat Configuration for Docker Containers
Filebeat ships logs from Docker containers to Logstash with minimal resource overhead. This configuration collects logs from all ChatGPT app containers with metadata enrichment.
# filebeat/filebeat.yml
filebeat.inputs:
# Docker container log input
- type: container
enabled: true
paths:
- '/var/lib/docker/containers/*/*.log'
# Decode JSON logs from containers
json.keys_under_root: true
json.overwrite_keys: true
json.add_error_key: true
# Add Docker metadata
processors:
- add_docker_metadata:
host: "unix:///var/run/docker.sock"
match_fields: ["container.id"]
labels.dedot: true
# Add container labels as fields
- decode_json_fields:
fields: ["message"]
process_array: false
max_depth: 3
target: ""
overwrite_keys: true
# Add custom fields
- add_fields:
target: ''
fields:
environment: ${ENVIRONMENT:production}
region: ${AWS_REGION:us-east-1}
# Filter containers by label
condition: "${data.docker.container.labels.logging} == 'enabled'"
# File input (for non-containerized logs)
- type: log
enabled: true
paths:
- /var/log/chatgpt-apps/*.log
fields:
log_source: file_system
multiline.pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
multiline.negate: true
multiline.match: after
# Filebeat modules (optional)
filebeat.modules:
- module: system
syslog:
enabled: true
auth:
enabled: true
# Output to Logstash
output.logstash:
hosts: ["logstash:5044"]
# Load balancing across multiple Logstash instances
loadbalance: true
# Enable compression
compression_level: 3
# Bulk settings
bulk_max_size: 2048
worker: 2
# Backpressure handling
slow_start: true
# Logging configuration
logging.level: info
logging.to_files: true
logging.files:
path: /var/log/filebeat
name: filebeat.log
keepfiles: 7
permissions: 0644
# Performance tuning
queue.mem:
events: 4096
flush.min_events: 512
flush.timeout: 1s
# Monitoring
monitoring.enabled: true
monitoring.elasticsearch:
hosts: ["http://elasticsearch:9200"]
username: "elastic"
password: "${ELASTIC_PASSWORD}"
Key features:
- Container autodiscovery: Automatically detects and ships logs from all Docker containers
- Metadata enrichment: Adds container name, labels, image, environment variables
- JSON decoding: Parses JSON logs before sending to Logstash
- Label filtering: Only ships logs from containers with
logging=enabledlabel - Multiline handling: Combines stack traces into single log events
- Backpressure: Slows down when Logstash is overloaded
Add logging label to ChatGPT app containers:
# In your ChatGPT app docker-compose.yml
services:
mcp-server:
labels:
- "logging=enabled"
Kibana Dashboard Configuration
Kibana dashboards visualize log data for real-time monitoring and debugging. This dashboard configuration tracks ChatGPT application health, performance, and error rates.
{
"title": "ChatGPT Application Monitoring Dashboard",
"description": "Real-time monitoring for ChatGPT apps: request volume, latency, errors, tool usage",
"panels": [
{
"id": "request_volume_timeline",
"type": "line",
"title": "Request Volume (Requests/min)",
"gridData": {"x": 0, "y": 0, "w": 12, "h": 4},
"visState": {
"type": "line",
"params": {
"type": "line",
"grid": {"categoryLines": false},
"categoryAxes": [{"id": "CategoryAxis-1", "type": "category", "position": "bottom", "show": true}],
"valueAxes": [{"id": "ValueAxis-1", "name": "Requests", "type": "value", "position": "left", "show": true}],
"seriesParams": [{"show": true, "type": "line", "mode": "normal", "data": {"label": "Requests", "id": "1"}}]
},
"aggs": [
{"id": "1", "enabled": true, "type": "count", "schema": "metric"},
{"id": "2", "enabled": true, "type": "date_histogram", "schema": "segment", "params": {"field": "@timestamp", "interval": "1m", "min_doc_count": 0}}
]
}
},
{
"id": "error_rate_gauge",
"type": "gauge",
"title": "Error Rate (%)",
"gridData": {"x": 12, "y": 0, "w": 6, "h": 4},
"visState": {
"type": "gauge",
"params": {
"gauge": {
"gaugeType": "Arc",
"percentageMode": true,
"colorSchema": "Green to Red",
"gaugeStyle": "Full",
"backStyle": "Full",
"orientation": "vertical",
"verticalSplit": false,
"labels": {"show": true, "color": "black"},
"scale": {"show": true, "labels": false, "color": "#333"},
"type": "meter",
"style": {"bgFill": "#000", "fontSize": 60}
}
},
"aggs": [
{"id": "1", "enabled": true, "type": "count", "schema": "metric", "params": {"customLabel": "Error Rate"}},
{"id": "2", "enabled": true, "type": "filters", "schema": "group", "params": {"filters": [{"input": {"query": "log_level:ERROR OR log_level:CRITICAL"}, "label": "Errors"}]}}
]
}
},
{
"id": "response_time_percentiles",
"type": "area",
"title": "Response Time Percentiles (ms)",
"gridData": {"x": 18, "y": 0, "w": 6, "h": 4},
"visState": {
"type": "area",
"aggs": [
{"id": "1", "enabled": true, "type": "percentiles", "schema": "metric", "params": {"field": "response_time", "percents": [50, 95, 99]}},
{"id": "2", "enabled": true, "type": "date_histogram", "schema": "segment", "params": {"field": "@timestamp", "interval": "1m"}}
]
}
},
{
"id": "top_tools_table",
"type": "table",
"title": "Top 10 Tools by Request Count",
"gridData": {"x": 0, "y": 4, "w": 12, "h": 4},
"visState": {
"type": "table",
"params": {
"perPage": 10,
"showPartialRows": false,
"showMetricsAtAllLevels": false,
"sort": {"columnIndex": null, "direction": null},
"showTotal": true,
"totalFunc": "sum"
},
"aggs": [
{"id": "1", "enabled": true, "type": "count", "schema": "metric"},
{"id": "2", "enabled": true, "type": "terms", "schema": "bucket", "params": {"field": "tool_name.keyword", "size": 10, "order": "desc", "orderBy": "1"}}
]
}
},
{
"id": "geographic_distribution_map",
"type": "map",
"title": "User Geographic Distribution",
"gridData": {"x": 12, "y": 4, "w": 12, "h": 4},
"visState": {
"type": "map",
"params": {
"mapType": "Coordinate Map",
"isDesaturated": false,
"mapZoom": 2,
"mapCenter": [0, 0]
},
"aggs": [
{"id": "1", "enabled": true, "type": "count", "schema": "metric"},
{"id": "2", "enabled": true, "type": "geohash_grid", "schema": "segment", "params": {"field": "geoip.location", "autoPrecision": true, "precision": 3}}
]
}
},
{
"id": "error_logs_table",
"type": "table",
"title": "Recent Error Logs",
"gridData": {"x": 0, "y": 8, "w": 24, "h": 4},
"visState": {
"type": "table",
"params": {
"perPage": 20,
"showPartialRows": false,
"showMetricsAtAllLevels": false
},
"aggs": [
{"id": "1", "enabled": true, "type": "top_hits", "schema": "metric", "params": {"field": "_source", "size": 20, "sortField": "@timestamp", "sortOrder": "desc"}},
{"id": "2", "enabled": true, "type": "filters", "schema": "bucket", "params": {"filters": [{"input": {"query": "log_level:ERROR OR log_level:CRITICAL"}, "label": ""}]}}
]
}
}
],
"timeRestore": true,
"timeFrom": "now-1h",
"timeTo": "now",
"refreshInterval": {
"pause": false,
"value": 30000
}
}
Dashboard features:
- Request volume timeline: Tracks requests per minute with 1-minute granularity
- Error rate gauge: Real-time error percentage with color-coded thresholds (green < 1%, yellow 1-5%, red > 5%)
- Response time percentiles: P50, P95, P99 latency visualization
- Top tools table: Shows which tools are most frequently called
- Geographic map: User distribution based on GeoIP lookup
- Error logs table: Live feed of ERROR/CRITICAL logs with full details
Import dashboard:
# Save JSON to chatgpt-dashboard.json, then:
curl -X POST "http://localhost:5601/api/kibana/dashboards/import" \
-H "kbn-xsrf: true" \
-H "Content-Type: application/json" \
-d @chatgpt-dashboard.json
Elasticsearch Index Template and ILM Policy
Index templates define field mappings and settings for new indices. ILM (Index Lifecycle Management) policies automate index lifecycle: rollover, retention, deletion.
{
"index_patterns": ["chatgpt-logs-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.codec": "best_compression",
"refresh_interval": "5s",
"index.lifecycle.name": "chatgpt-logs-policy",
"index.lifecycle.rollover_alias": "chatgpt-logs"
},
"mappings": {
"properties": {
"@timestamp": {"type": "date"},
"log_level": {"type": "keyword"},
"service_name": {"type": "keyword"},
"tool_name": {"type": "keyword"},
"user_id": {"type": "keyword"},
"response_time": {"type": "long"},
"response_time_category": {"type": "keyword"},
"error_message": {"type": "text", "fields": {"keyword": {"type": "keyword", "ignore_above": 256}}},
"stack_trace": {"type": "text"},
"message": {"type": "text"},
"environment": {"type": "keyword"},
"region": {"type": "keyword"},
"deployment_version": {"type": "keyword"},
"client_ip": {"type": "ip"},
"geoip": {
"properties": {
"city_name": {"type": "keyword"},
"country_name": {"type": "keyword"},
"location": {"type": "geo_point"}
}
}
}
}
}
}
ILM Policy:
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_primary_shard_size": "50GB",
"max_age": "1d"
},
"set_priority": {
"priority": 100
}
}
},
"warm": {
"min_age": "3d",
"actions": {
"shrink": {
"number_of_shards": 1
},
"forcemerge": {
"max_num_segments": 1
},
"set_priority": {
"priority": 50
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
Lifecycle phases:
- Hot phase: Active indices receiving writes. Rollover after 1 day or 50GB per shard.
- Warm phase: Older indices (3+ days). Shrink to 1 shard, force merge segments for compression.
- Delete phase: Indices older than 30 days are automatically deleted.
Apply template and policy:
# Create index template
curl -X PUT "http://localhost:9200/_index_template/chatgpt-logs-template" \
-H "Content-Type: application/json" \
-d @index-template.json
# Create ILM policy
curl -X PUT "http://localhost:9200/_ilm/policy/chatgpt-logs-policy" \
-H "Content-Type: application/json" \
-d @ilm-policy.json
For index optimization strategies, see: Elasticsearch Optimization for ChatGPT.
Production Deployment and Scaling
Deploying the ELK Stack to production requires careful planning for high availability, security, backup, and monitoring.
High Availability Architecture
For production ChatGPT applications handling millions of logs per day:
- Elasticsearch cluster: Minimum 3 master-eligible nodes (quorum = 2). Separate data nodes for horizontal scaling.
- Logstash: Deploy 2+ instances behind a load balancer (Filebeat automatically load balances).
- Kibana: Run 2+ instances behind a load balancer with session affinity.
- Filebeat: Deploy as a DaemonSet (Kubernetes) or on every Docker host.
Example AWS deployment:
- Elasticsearch: 3× c5.2xlarge instances (8 vCPU, 16GB RAM) across 3 availability zones
- Logstash: 2× c5.xlarge instances (4 vCPU, 8GB RAM) behind Application Load Balancer
- Kibana: 2× t3.medium instances (2 vCPU, 4GB RAM) behind ALB with sticky sessions
Security Hardening
- Enable X-Pack Security: Authentication, role-based access control (RBAC), field-level security
- TLS/SSL encryption: Encrypt all communication (Elasticsearch cluster, Logstash → ES, Kibana → ES)
- API key authentication: Use API keys instead of passwords for application log shipping
- Network segmentation: Place Elasticsearch/Logstash in private subnets, expose only Kibana via ALB
- Audit logging: Enable audit logs for all authentication, authorization, and data access events
# elasticsearch.yml security configuration
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.audit.enabled: true
Backup and Disaster Recovery
Snapshot repository (S3 example):
# Register S3 snapshot repository
curl -X PUT "http://localhost:9200/_snapshot/chatgpt-logs-backup" -H "Content-Type: application/json" -d '{
"type": "s3",
"settings": {
"bucket": "chatgpt-elasticsearch-backups",
"region": "us-east-1",
"base_path": "snapshots",
"compress": true
}
}'
# Create snapshot (automated via cron or Elasticsearch snapshot policy)
curl -X PUT "http://localhost:9200/_snapshot/chatgpt-logs-backup/snapshot-$(date +%Y%m%d-%H%M%S)?wait_for_completion=false"
Snapshot policy (automated daily backups, 30-day retention):
{
"policy": {
"schedule": "0 2 * * *",
"name": "<chatgpt-logs-{now/d}>",
"repository": "chatgpt-logs-backup",
"config": {
"indices": ["chatgpt-logs-*"],
"ignore_unavailable": false,
"include_global_state": false
},
"retention": {
"expire_after": "30d",
"min_count": 5,
"max_count": 50
}
}
}
Monitoring and Alerting
Use Elasticsearch built-in monitoring (X-Pack):
# Enable monitoring in elasticsearch.yml
xpack.monitoring.collection.enabled: true
# In Kibana: Stack Monitoring shows cluster health, node stats, index stats
Watcher alerts for critical errors:
{
"trigger": {
"schedule": {"interval": "5m"}
},
"input": {
"search": {
"request": {
"indices": ["chatgpt-logs-*"],
"body": {
"query": {
"bool": {
"must": [
{"range": {"@timestamp": {"gte": "now-5m"}}},
{"terms": {"log_level": ["ERROR", "CRITICAL"]}}
]
}
}
}
}
}
},
"condition": {
"compare": {"ctx.payload.hits.total": {"gte": 100}}
},
"actions": {
"send_email": {
"email": {
"to": "ops@example.com",
"subject": "ChatGPT App: High Error Rate Alert",
"body": "Detected {{ctx.payload.hits.total}} errors in the last 5 minutes"
}
}
}
}
Conclusion
The ELK Stack provides a production-ready log aggregation platform for ChatGPT applications, enabling centralized search, real-time analytics, and proactive monitoring across distributed MCP servers and widgets. With the Docker Compose setup, Logstash pipelines, and Kibana dashboards provided in this guide, you now have a complete logging infrastructure that scales from prototype to production.
Key takeaways:
- Elasticsearch provides fast, scalable log storage with near-real-time search
- Logstash transforms and enriches logs with filters, grok patterns, and metadata
- Kibana visualizes log data with customizable dashboards and alerts
- Filebeat ships logs from Docker containers with minimal overhead
- Production deployment requires high availability, security hardening, backup strategies, and monitoring
For complete ChatGPT application development workflows including logging integration, see our Complete Guide to Building ChatGPT Applications.
Skip the Infrastructure Complexity
Building and maintaining the ELK Stack requires significant DevOps expertise and ongoing management. If you'd rather focus on building your ChatGPT application instead of managing logging infrastructure, MakeAIHQ provides:
- Managed log aggregation with centralized search and real-time dashboards
- Pre-built monitoring for MCP servers, tool calls, and error tracking
- Automatic alerting for performance degradation and error spikes
- No infrastructure to manage – we handle Elasticsearch, Logstash, Kibana scaling and upgrades
Start your free trial and deploy production ChatGPT apps with enterprise logging in minutes, not weeks.
Related Guides:
- Complete Guide to Building ChatGPT Applications – Pillar guide
- MCP Server Logging Best Practices for ChatGPT – Structured logging standards
- Log Analysis with Kibana for ChatGPT – Advanced Kibana query techniques
- Elasticsearch Optimization for ChatGPT – Performance tuning and shard management
- Prometheus Metrics for ChatGPT Apps – Complementary metrics monitoring
External Resources:
- Elastic Stack Official Documentation – Complete Elasticsearch, Logstash, Kibana reference
- Logstash Filter Plugins – Official filter plugin documentation
- Kibana Dashboard and Visualization Guide – Creating custom dashboards