Kubernetes Operators for ChatGPT Apps: Custom Resource Management

Managing ChatGPT applications at scale requires sophisticated orchestration beyond basic Kubernetes deployments. Kubernetes operators provide the automation framework needed to deploy, configure, and maintain ChatGPT apps with the same operational expertise that human operators bring to complex systems.

This comprehensive guide explores building production-ready Kubernetes operators specifically designed for ChatGPT applications. You'll learn how to create Custom Resource Definitions (CRDs), implement reconciliation controllers, handle lifecycle management, and deploy operators that can manage hundreds of ChatGPT apps across multiple clusters.

By mastering the operator pattern, you'll transform ChatGPT app deployment from manual kubectl commands into declarative, self-healing infrastructure that scales effortlessly. Whether you're running a single ChatGPT app or managing a fleet of AI-powered services, operators provide the foundation for reliable, automated operations.

Understanding the Operator Pattern for ChatGPT Apps

The Kubernetes operator pattern extends Kubernetes functionality to manage complex, stateful applications through custom controllers that encode operational knowledge as code. For ChatGPT apps, operators automate tasks like MCP server deployment, widget configuration, OAuth credential rotation, and traffic management.

Core Components of ChatGPT Operators

Operators consist of three fundamental components working together:

Custom Resource Definitions (CRDs) define new Kubernetes API types representing ChatGPT apps. A ChatGPTApp CRD might include specifications for MCP endpoints, widget templates, authentication providers, and deployment strategies. CRDs extend the Kubernetes API surface, allowing you to manage ChatGPT apps using familiar kubectl commands.

Controllers implement the reconciliation logic that drives operators. They watch ChatGPT app resources, compare desired state (defined in CRDs) with actual cluster state, and execute actions to achieve convergence. Controllers run continuously, ensuring ChatGPT apps remain configured correctly even when infrastructure changes.

Reconciliation loops form the operational heart of controllers. Every 10-30 seconds, controllers query the Kubernetes API for ChatGPT app resources, detect configuration drift, and apply corrective actions. This control loop provides self-healing capabilities—if a ChatGPT app's MCP server crashes, the operator automatically recreates it based on the CRD specification.

Why Operators Excel for ChatGPT App Management

Traditional Kubernetes resources like Deployments and Services lack the domain-specific logic needed for ChatGPT apps. Operators bridge this gap:

  • MCP Server Lifecycle: Operators manage MCP server deployments, health checks, version upgrades, and traffic routing based on ChatGPT-specific requirements
  • Configuration Management: Automatically generate widget templates, OAuth configurations, and API schemas from high-level ChatGPT app specifications
  • Credential Rotation: Implement secure OAuth token refresh, API key rotation, and certificate renewal without manual intervention
  • Multi-Tenancy: Isolate ChatGPT apps across namespaces with tenant-specific resource quotas, network policies, and RBAC rules
  • Observability: Create standardized monitoring dashboards, alerts, and log aggregation for all ChatGPT apps managed by the operator

The operator pattern transforms ChatGPT app infrastructure from imperative scripts into declarative, version-controlled manifests that capture organizational knowledge about operating AI applications at scale.

Operator Capabilities vs. Traditional Deployments

Capability Traditional K8s ChatGPT Operator
Deploy MCP Server Manual YAML Declarative CRD
Widget Updates kubectl apply Automated rollout
OAuth Setup External scripts Built-in controller
Health Monitoring Basic probes Domain-specific checks
Scaling Logic Generic HPA ChatGPT-aware autoscaling
Disaster Recovery Manual restore Automated reconciliation

Operators encapsulate the expertise of experienced ChatGPT platform engineers, making that knowledge executable and repeatable across your entire infrastructure. Learn more about ChatGPT app architecture patterns and MCP server deployment strategies.

Building Operators with the Operator SDK

The Operator SDK provides scaffolding tools and frameworks that dramatically simplify operator development. Instead of writing thousands of lines of boilerplate Kubernetes client code, the SDK generates project structure, RBAC configurations, and CRD manifests, letting you focus on ChatGPT-specific business logic.

Operator SDK Project Structure

Initialize a new ChatGPT operator project:

# Install Operator SDK (v1.33+)
curl -LO https://github.com/operator-framework/operator-sdk/releases/download/v1.33.0/operator-sdk_linux_amd64
chmod +x operator-sdk_linux_amd64
sudo mv operator-sdk_linux_amd64 /usr/local/bin/operator-sdk

# Create new Go-based operator
operator-sdk init --domain=makeaihq.com --repo=github.com/makeaihq/chatgpt-operator
operator-sdk create api --group apps --version v1alpha1 --kind ChatGPTApp --resource --controller

# Generate CRD manifests
make manifests

This creates a project structure:

chatgpt-operator/
├── api/v1alpha1/
│   ├── chatgptapp_types.go      # CRD schema
│   └── zz_generated.deepcopy.go # Auto-generated
├── config/
│   ├── crd/                      # CRD manifests
│   ├── rbac/                     # RBAC rules
│   └── manager/                  # Operator deployment
├── controllers/
│   └── chatgptapp_controller.go  # Reconciliation logic
└── main.go                        # Entry point

The SDK supports three operator types:

  • Go Operators: Full flexibility, custom reconciliation logic, best for complex ChatGPT app management
  • Ansible Operators: Declarative playbook-based automation, simpler for configuration-heavy workflows
  • Helm Operators: Wrap existing Helm charts with operator lifecycle management

For ChatGPT apps requiring sophisticated MCP server orchestration, OAuth flows, and widget rendering, Go operators provide the necessary control and performance.

Defining the ChatGPTApp CRD

Edit api/v1alpha1/chatgptapp_types.go to define your ChatGPT app schema:

package v1alpha1

import (
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// ChatGPTAppSpec defines the desired state of ChatGPTApp
type ChatGPTAppSpec struct {
	// DisplayName is the human-readable name shown in ChatGPT Store
	// +kubebuilder:validation:MinLength=3
	// +kubebuilder:validation:MaxLength=50
	DisplayName string `json:"displayName"`

	// Description provides context for the ChatGPT app
	// +kubebuilder:validation:MaxLength=300
	Description string `json:"description"`

	// MCPServer defines the Model Context Protocol server configuration
	MCPServer MCPServerConfig `json:"mcpServer"`

	// Widgets defines UI components rendered in ChatGPT
	// +optional
	Widgets []WidgetConfig `json:"widgets,omitempty"`

	// OAuth configures authentication for the ChatGPT app
	// +optional
	OAuth *OAuthConfig `json:"oauth,omitempty"`

	// Replicas specifies the number of MCP server instances
	// +kubebuilder:validation:Minimum=1
	// +kubebuilder:validation:Maximum=10
	// +kubebuilder:default=2
	Replicas int32 `json:"replicas"`

	// Resources defines CPU/memory requests and limits
	// +optional
	Resources *ResourceRequirements `json:"resources,omitempty"`
}

// MCPServerConfig defines MCP server deployment parameters
type MCPServerConfig struct {
	// Image is the container image for the MCP server
	// +kubebuilder:validation:Pattern=`^[a-z0-9\-\.\/]+:[a-z0-9\-\.]+$`
	Image string `json:"image"`

	// Port is the HTTP port for MCP protocol communication
	// +kubebuilder:validation:Minimum=1024
	// +kubebuilder:validation:Maximum=65535
	// +kubebuilder:default=8080
	Port int32 `json:"port"`

	// Tools defines the MCP tools exposed to ChatGPT
	Tools []MCPTool `json:"tools"`

	// HealthCheckPath is the endpoint for liveness/readiness probes
	// +kubebuilder:default="/health"
	HealthCheckPath string `json:"healthCheckPath,omitempty"`
}

// MCPTool represents a single tool in the MCP server
type MCPTool struct {
	// Name is the tool identifier
	Name string `json:"name"`

	// Description explains the tool's purpose
	Description string `json:"description"`

	// InputSchema is the JSON schema for tool parameters
	InputSchema map[string]interface{} `json:"inputSchema"`
}

// WidgetConfig defines a UI widget rendered in ChatGPT
type WidgetConfig struct {
	// Name identifies the widget
	Name string `json:"name"`

	// Template is the HTML template with Skybridge API
	Template string `json:"template"`

	// MaxTokens limits the widget content size
	// +kubebuilder:validation:Maximum=4000
	// +kubebuilder:default=2000
	MaxTokens int32 `json:"maxTokens,omitempty"`
}

// OAuthConfig defines OAuth 2.1 authentication
type OAuthConfig struct {
	// ClientID is the OAuth client identifier
	ClientID string `json:"clientId"`

	// ClientSecretRef references a Kubernetes Secret
	ClientSecretRef SecretReference `json:"clientSecretRef"`

	// AuthorizationURL is the OAuth authorization endpoint
	AuthorizationURL string `json:"authorizationUrl"`

	// TokenURL is the OAuth token exchange endpoint
	TokenURL string `json:"tokenUrl"`

	// Scopes defines the requested OAuth scopes
	Scopes []string `json:"scopes"`
}

// SecretReference points to a Kubernetes Secret
type SecretReference struct {
	// Name is the Secret name
	Name string `json:"name"`

	// Key is the Secret data key
	Key string `json:"key"`
}

// ResourceRequirements defines compute resources
type ResourceRequirements struct {
	// Requests defines minimum resources
	Requests ResourceList `json:"requests"`

	// Limits defines maximum resources
	Limits ResourceList `json:"limits"`
}

// ResourceList specifies CPU and memory quantities
type ResourceList struct {
	// CPU in cores (e.g., "500m" = 0.5 cores)
	CPU string `json:"cpu"`

	// Memory in bytes (e.g., "512Mi")
	Memory string `json:"memory"`
}

// ChatGPTAppStatus defines the observed state of ChatGPTApp
type ChatGPTAppStatus struct {
	// Phase represents the current deployment phase
	// +kubebuilder:validation:Enum=Pending;Deploying;Ready;Failed
	Phase string `json:"phase"`

	// Conditions represent the latest observations
	Conditions []metav1.Condition `json:"conditions,omitempty"`

	// ObservedGeneration is the last processed generation
	ObservedGeneration int64 `json:"observedGeneration,omitempty"`

	// Endpoint is the public MCP server URL
	Endpoint string `json:"endpoint,omitempty"`

	// ReadyReplicas counts healthy MCP server instances
	ReadyReplicas int32 `json:"readyReplicas"`

	// LastUpdated timestamp of the last reconciliation
	LastUpdated metav1.Time `json:"lastUpdated,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Phase",type=string,JSONPath=`.status.phase`
// +kubebuilder:printcolumn:name="Ready",type=integer,JSONPath=`.status.readyReplicas`
// +kubebuilder:printcolumn:name="Endpoint",type=string,JSONPath=`.status.endpoint`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`

// ChatGPTApp is the Schema for the chatgptapps API
type ChatGPTApp struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   ChatGPTAppSpec   `json:"spec,omitempty"`
	Status ChatGPTAppStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// ChatGPTAppList contains a list of ChatGPTApp
type ChatGPTAppList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []ChatGPTApp `json:"items"`
}

func init() {
	SchemeBuilder.Register(&ChatGPTApp{}, &ChatGPTAppList{})
}

After defining the CRD schema, regenerate manifests:

make manifests
make generate

This CRD schema enables declarative ChatGPT app management with type-safe validation, default values, and automatic status tracking. Explore ChatGPT app configuration patterns and MCP protocol implementation.

Implementing the Controller Reconciliation Loop

The controller reconciliation loop is where operator logic executes. Controllers watch ChatGPT app resources, detect changes, and perform actions to align actual cluster state with desired CRD specifications.

Core Reconcile Function

Edit controllers/chatgptapp_controller.go:

package controllers

import (
	"context"
	"fmt"
	"time"

	appsv1 "k8s.io/api/apps/v1"
	corev1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/api/errors"
	"k8s.io/apimachinery/pkg/api/resource"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/apimachinery/pkg/types"
	"k8s.io/apimachinery/pkg/util/intstr"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
	"sigs.k8s.io/controller-runtime/pkg/log"

	appsv1alpha1 "github.com/makeaihq/chatgpt-operator/api/v1alpha1"
)

const (
	chatGPTAppFinalizer = "apps.makeaihq.com/finalizer"
	reconcileInterval   = 30 * time.Second
)

// ChatGPTAppReconciler reconciles a ChatGPTApp object
type ChatGPTAppReconciler struct {
	client.Client
	Scheme *runtime.Scheme
}

// +kubebuilder:rbac:groups=apps.makeaihq.com,resources=chatgptapps,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=apps.makeaihq.com,resources=chatgptapps/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=apps.makeaihq.com,resources=chatgptapps/finalizers,verbs=update
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=secrets,verbs=get;list;watch

// Reconcile processes ChatGPTApp resources
func (r *ChatGPTAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	logger := log.FromContext(ctx)
	logger.Info("Reconciling ChatGPTApp", "name", req.Name, "namespace", req.Namespace)

	// Fetch ChatGPTApp instance
	app := &appsv1alpha1.ChatGPTApp{}
	if err := r.Get(ctx, req.NamespacedName, app); err != nil {
		if errors.IsNotFound(err) {
			logger.Info("ChatGPTApp resource not found, ignoring")
			return ctrl.Result{}, nil
		}
		logger.Error(err, "Failed to get ChatGPTApp")
		return ctrl.Result{}, err
	}

	// Handle deletion with finalizers
	if app.ObjectMeta.DeletionTimestamp != nil {
		if controllerutil.ContainsFinalizer(app, chatGPTAppFinalizer) {
			if err := r.finalizeChatGPTApp(ctx, app); err != nil {
				return ctrl.Result{}, err
			}

			controllerutil.RemoveFinalizer(app, chatGPTAppFinalizer)
			if err := r.Update(ctx, app); err != nil {
				return ctrl.Result{}, err
			}
		}
		return ctrl.Result{}, nil
	}

	// Add finalizer if not present
	if !controllerutil.ContainsFinalizer(app, chatGPTAppFinalizer) {
		controllerutil.AddFinalizer(app, chatGPTAppFinalizer)
		if err := r.Update(ctx, app); err != nil {
			return ctrl.Result{}, err
		}
	}

	// Update status to Deploying
	if app.Status.Phase != "Deploying" && app.Status.Phase != "Ready" {
		app.Status.Phase = "Deploying"
		if err := r.Status().Update(ctx, app); err != nil {
			logger.Error(err, "Failed to update status to Deploying")
			return ctrl.Result{}, err
		}
	}

	// Reconcile Deployment
	deployment := r.buildDeployment(app)
	if err := controllerutil.SetControllerReference(app, deployment, r.Scheme); err != nil {
		logger.Error(err, "Failed to set controller reference on Deployment")
		return ctrl.Result{}, err
	}

	existingDeployment := &appsv1.Deployment{}
	err := r.Get(ctx, types.NamespacedName{Name: deployment.Name, Namespace: deployment.Namespace}, existingDeployment)
	if err != nil && errors.IsNotFound(err) {
		logger.Info("Creating Deployment", "name", deployment.Name)
		if err := r.Create(ctx, deployment); err != nil {
			logger.Error(err, "Failed to create Deployment")
			return r.updateStatusFailed(ctx, app, "DeploymentCreateFailed", err.Error())
		}
	} else if err != nil {
		logger.Error(err, "Failed to get Deployment")
		return ctrl.Result{}, err
	} else {
		// Update existing Deployment if spec changed
		if !deploymentSpecEqual(existingDeployment, deployment) {
			logger.Info("Updating Deployment", "name", deployment.Name)
			existingDeployment.Spec = deployment.Spec
			if err := r.Update(ctx, existingDeployment); err != nil {
				logger.Error(err, "Failed to update Deployment")
				return r.updateStatusFailed(ctx, app, "DeploymentUpdateFailed", err.Error())
			}
		}
	}

	// Reconcile Service
	service := r.buildService(app)
	if err := controllerutil.SetControllerReference(app, service, r.Scheme); err != nil {
		logger.Error(err, "Failed to set controller reference on Service")
		return ctrl.Result{}, err
	}

	existingService := &corev1.Service{}
	err = r.Get(ctx, types.NamespacedName{Name: service.Name, Namespace: service.Namespace}, existingService)
	if err != nil && errors.IsNotFound(err) {
		logger.Info("Creating Service", "name", service.Name)
		if err := r.Create(ctx, service); err != nil {
			logger.Error(err, "Failed to create Service")
			return r.updateStatusFailed(ctx, app, "ServiceCreateFailed", err.Error())
		}
	} else if err != nil {
		logger.Error(err, "Failed to get Service")
		return ctrl.Result{}, err
	}

	// Update status based on Deployment readiness
	if err := r.updateStatus(ctx, app, existingDeployment); err != nil {
		logger.Error(err, "Failed to update status")
		return ctrl.Result{}, err
	}

	logger.Info("Reconciliation complete", "phase", app.Status.Phase)
	return ctrl.Result{RequeueAfter: reconcileInterval}, nil
}

// buildDeployment creates a Deployment for the MCP server
func (r *ChatGPTAppReconciler) buildDeployment(app *appsv1alpha1.ChatGPTApp) *appsv1.Deployment {
	labels := map[string]string{
		"app":                          app.Name,
		"app.kubernetes.io/name":       app.Name,
		"app.kubernetes.io/component":  "mcp-server",
		"app.kubernetes.io/managed-by": "chatgpt-operator",
	}

	replicas := app.Spec.Replicas

	deployment := &appsv1.Deployment{
		ObjectMeta: metav1.ObjectMeta{
			Name:      app.Name + "-mcp",
			Namespace: app.Namespace,
			Labels:    labels,
		},
		Spec: appsv1.DeploymentSpec{
			Replicas: &replicas,
			Selector: &metav1.LabelSelector{
				MatchLabels: labels,
			},
			Template: corev1.PodTemplateSpec{
				ObjectMeta: metav1.ObjectMeta{
					Labels: labels,
				},
				Spec: corev1.PodSpec{
					Containers: []corev1.Container{
						{
							Name:  "mcp-server",
							Image: app.Spec.MCPServer.Image,
							Ports: []corev1.ContainerPort{
								{
									Name:          "http",
									ContainerPort: app.Spec.MCPServer.Port,
									Protocol:      corev1.ProtocolTCP,
								},
							},
							Env: r.buildEnvVars(app),
							LivenessProbe: &corev1.Probe{
								ProbeHandler: corev1.ProbeHandler{
									HTTPGet: &corev1.HTTPGetAction{
										Path: app.Spec.MCPServer.HealthCheckPath,
										Port: intstr.FromInt(int(app.Spec.MCPServer.Port)),
									},
								},
								InitialDelaySeconds: 10,
								PeriodSeconds:       10,
								TimeoutSeconds:      5,
								FailureThreshold:    3,
							},
							ReadinessProbe: &corev1.Probe{
								ProbeHandler: corev1.ProbeHandler{
									HTTPGet: &corev1.HTTPGetAction{
										Path: app.Spec.MCPServer.HealthCheckPath,
										Port: intstr.FromInt(int(app.Spec.MCPServer.Port)),
									},
								},
								InitialDelaySeconds: 5,
								PeriodSeconds:       5,
								TimeoutSeconds:      3,
								FailureThreshold:    2,
							},
						},
					},
				},
			},
		},
	}

	// Add resource requests/limits if specified
	if app.Spec.Resources != nil {
		deployment.Spec.Template.Spec.Containers[0].Resources = corev1.ResourceRequirements{
			Requests: corev1.ResourceList{
				corev1.ResourceCPU:    resource.MustParse(app.Spec.Resources.Requests.CPU),
				corev1.ResourceMemory: resource.MustParse(app.Spec.Resources.Requests.Memory),
			},
			Limits: corev1.ResourceList{
				corev1.ResourceCPU:    resource.MustParse(app.Spec.Resources.Limits.CPU),
				corev1.ResourceMemory: resource.MustParse(app.Spec.Resources.Limits.Memory),
			},
		}
	}

	return deployment
}

// buildService creates a Service for the MCP server
func (r *ChatGPTAppReconciler) buildService(app *appsv1alpha1.ChatGPTApp) *corev1.Service {
	labels := map[string]string{
		"app":                         app.Name,
		"app.kubernetes.io/name":      app.Name,
		"app.kubernetes.io/component": "mcp-server",
	}

	return &corev1.Service{
		ObjectMeta: metav1.ObjectMeta{
			Name:      app.Name + "-mcp",
			Namespace: app.Namespace,
			Labels:    labels,
		},
		Spec: corev1.ServiceSpec{
			Selector: labels,
			Ports: []corev1.ServicePort{
				{
					Name:       "http",
					Protocol:   corev1.ProtocolTCP,
					Port:       80,
					TargetPort: intstr.FromInt(int(app.Spec.MCPServer.Port)),
				},
			},
			Type: corev1.ServiceTypeClusterIP,
		},
	}
}

// buildEnvVars constructs environment variables for the MCP server
func (r *ChatGPTAppReconciler) buildEnvVars(app *appsv1alpha1.ChatGPTApp) []corev1.EnvVar {
	envVars := []corev1.EnvVar{
		{Name: "CHATGPT_APP_NAME", Value: app.Spec.DisplayName},
		{Name: "CHATGPT_APP_DESCRIPTION", Value: app.Spec.Description},
		{Name: "MCP_PORT", Value: fmt.Sprintf("%d", app.Spec.MCPServer.Port)},
	}

	// Add OAuth configuration if present
	if app.Spec.OAuth != nil {
		envVars = append(envVars,
			corev1.EnvVar{Name: "OAUTH_CLIENT_ID", Value: app.Spec.OAuth.ClientID},
			corev1.EnvVar{Name: "OAUTH_AUTH_URL", Value: app.Spec.OAuth.AuthorizationURL},
			corev1.EnvVar{Name: "OAUTH_TOKEN_URL", Value: app.Spec.OAuth.TokenURL},
			corev1.EnvVar{
				Name: "OAUTH_CLIENT_SECRET",
				ValueFrom: &corev1.EnvVarSource{
					SecretKeyRef: &corev1.SecretKeySelector{
						LocalObjectReference: corev1.LocalObjectReference{
							Name: app.Spec.OAuth.ClientSecretRef.Name,
						},
						Key: app.Spec.OAuth.ClientSecretRef.Key,
					},
				},
			},
		)
	}

	return envVars
}

// finalizeChatGPTApp performs cleanup before deletion
func (r *ChatGPTAppReconciler) finalizeChatGPTApp(ctx context.Context, app *appsv1alpha1.ChatGPTApp) error {
	logger := log.FromContext(ctx)
	logger.Info("Finalizing ChatGPTApp", "name", app.Name)

	// TODO: Deregister from ChatGPT Store API
	// TODO: Revoke OAuth credentials
	// TODO: Clean up external resources

	return nil
}

// updateStatus updates the ChatGPTApp status based on Deployment state
func (r *ChatGPTAppReconciler) updateStatus(ctx context.Context, app *appsv1alpha1.ChatGPTApp, deployment *appsv1.Deployment) error {
	readyReplicas := deployment.Status.ReadyReplicas
	desiredReplicas := app.Spec.Replicas

	app.Status.ReadyReplicas = readyReplicas
	app.Status.ObservedGeneration = app.Generation
	app.Status.LastUpdated = metav1.Now()
	app.Status.Endpoint = fmt.Sprintf("http://%s-mcp.%s.svc.cluster.local", app.Name, app.Namespace)

	if readyReplicas == desiredReplicas {
		app.Status.Phase = "Ready"
		r.setCondition(app, "Ready", metav1.ConditionTrue, "AllReplicasReady", "All MCP server replicas are healthy")
	} else {
		app.Status.Phase = "Deploying"
		r.setCondition(app, "Ready", metav1.ConditionFalse, "ReplicasNotReady", fmt.Sprintf("%d/%d replicas ready", readyReplicas, desiredReplicas))
	}

	return r.Status().Update(ctx, app)
}

// updateStatusFailed updates status to Failed
func (r *ChatGPTAppReconciler) updateStatusFailed(ctx context.Context, app *appsv1alpha1.ChatGPTApp, reason, message string) (ctrl.Result, error) {
	app.Status.Phase = "Failed"
	app.Status.LastUpdated = metav1.Now()
	r.setCondition(app, "Ready", metav1.ConditionFalse, reason, message)

	if err := r.Status().Update(ctx, app); err != nil {
		return ctrl.Result{}, err
	}

	return ctrl.Result{RequeueAfter: reconcileInterval}, nil
}

// setCondition updates or adds a status condition
func (r *ChatGPTAppReconciler) setCondition(app *appsv1alpha1.ChatGPTApp, conditionType string, status metav1.ConditionStatus, reason, message string) {
	condition := metav1.Condition{
		Type:               conditionType,
		Status:             status,
		Reason:             reason,
		Message:            message,
		LastTransitionTime: metav1.Now(),
		ObservedGeneration: app.Generation,
	}

	// Find existing condition
	for i, existing := range app.Status.Conditions {
		if existing.Type == conditionType {
			if existing.Status != status {
				app.Status.Conditions[i] = condition
			}
			return
		}
	}

	// Add new condition
	app.Status.Conditions = append(app.Status.Conditions, condition)
}

// deploymentSpecEqual compares two Deployments for equality
func deploymentSpecEqual(a, b *appsv1.Deployment) bool {
	return a.Spec.Replicas != nil && b.Spec.Replicas != nil &&
		*a.Spec.Replicas == *b.Spec.Replicas &&
		a.Spec.Template.Spec.Containers[0].Image == b.Spec.Template.Spec.Containers[0].Image
}

// SetupWithManager sets up the controller with the Manager
func (r *ChatGPTAppReconciler) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&appsv1alpha1.ChatGPTApp{}).
		Owns(&appsv1.Deployment{}).
		Owns(&corev1.Service{}).
		Complete(r)
}

This reconciliation loop creates and maintains Deployments and Services for ChatGPT apps, with automatic health checks, resource management, and status updates. See Kubernetes deployment patterns and container orchestration strategies.

Advanced Operator Patterns

Production ChatGPT operators require sophisticated patterns beyond basic reconciliation: admission webhooks for validation, leader election for high availability, and RBAC for secure multi-tenancy.

Admission Webhook Implementation

Webhooks intercept API requests before resources are persisted, enabling validation and mutation logic:

// api/v1alpha1/chatgptapp_webhook.go
package v1alpha1

import (
	"fmt"
	"regexp"

	"k8s.io/apimachinery/pkg/runtime"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/webhook"
	"sigs.k8s.io/controller-runtime/pkg/webhook/admission"
)

var (
	imageRegex = regexp.MustCompile(`^[a-z0-9\-\.\/]+:[a-z0-9\-\.]+$`)
	urlRegex   = regexp.MustCompile(`^https?://[^\s]+$`)
)

// SetupWebhookWithManager registers the webhook with the manager
func (r *ChatGPTApp) SetupWebhookWithManager(mgr ctrl.Manager) error {
	return ctrl.NewWebhookManagedBy(mgr).
		For(r).
		Complete()
}

// +kubebuilder:webhook:path=/mutate-apps-makeaihq-com-v1alpha1-chatgptapp,mutating=true,failurePolicy=fail,sideEffects=None,groups=apps.makeaihq.com,resources=chatgptapps,verbs=create;update,versions=v1alpha1,name=mchatgptapp.kb.io,admissionReviewVersions=v1

var _ webhook.Defaulter = &ChatGPTApp{}

// Default implements webhook.Defaulter for setting default values
func (r *ChatGPTApp) Default() {
	// Set default replicas if not specified
	if r.Spec.Replicas == 0 {
		r.Spec.Replicas = 2
	}

	// Set default health check path
	if r.Spec.MCPServer.HealthCheckPath == "" {
		r.Spec.MCPServer.HealthCheckPath = "/health"
	}

	// Set default port
	if r.Spec.MCPServer.Port == 0 {
		r.Spec.MCPServer.Port = 8080
	}

	// Set default resources if not specified
	if r.Spec.Resources == nil {
		r.Spec.Resources = &ResourceRequirements{
			Requests: ResourceList{
				CPU:    "100m",
				Memory: "128Mi",
			},
			Limits: ResourceList{
				CPU:    "500m",
				Memory: "512Mi",
			},
		}
	}
}

// +kubebuilder:webhook:path=/validate-apps-makeaihq-com-v1alpha1-chatgptapp,mutating=false,failurePolicy=fail,sideEffects=None,groups=apps.makeaihq.com,resources=chatgptapps,verbs=create;update,versions=v1alpha1,name=vchatgptapp.kb.io,admissionReviewVersions=v1

var _ webhook.Validator = &ChatGPTApp{}

// ValidateCreate implements webhook.Validator for creation
func (r *ChatGPTApp) ValidateCreate() (admission.Warnings, error) {
	return r.validateChatGPTApp()
}

// ValidateUpdate implements webhook.Validator for updates
func (r *ChatGPTApp) ValidateUpdate(old runtime.Object) (admission.Warnings, error) {
	return r.validateChatGPTApp()
}

// ValidateDelete implements webhook.Validator for deletion
func (r *ChatGPTApp) ValidateDelete() (admission.Warnings, error) {
	return nil, nil
}

// validateChatGPTApp performs comprehensive validation
func (r *ChatGPTApp) validateChatGPTApp() (admission.Warnings, error) {
	var warnings admission.Warnings

	// Validate display name
	if len(r.Spec.DisplayName) < 3 {
		return nil, fmt.Errorf("displayName must be at least 3 characters")
	}
	if len(r.Spec.DisplayName) > 50 {
		return nil, fmt.Errorf("displayName must be at most 50 characters")
	}

	// Validate MCP server image format
	if !imageRegex.MatchString(r.Spec.MCPServer.Image) {
		return nil, fmt.Errorf("mcpServer.image must match format: registry/image:tag")
	}

	// Validate port range
	if r.Spec.MCPServer.Port < 1024 || r.Spec.MCPServer.Port > 65535 {
		return nil, fmt.Errorf("mcpServer.port must be between 1024 and 65535")
	}

	// Validate MCP tools
	if len(r.Spec.MCPServer.Tools) == 0 {
		return nil, fmt.Errorf("mcpServer.tools must contain at least one tool")
	}

	for _, tool := range r.Spec.MCPServer.Tools {
		if tool.Name == "" {
			return nil, fmt.Errorf("all tools must have a name")
		}
		if tool.Description == "" {
			warnings = append(warnings, fmt.Sprintf("Tool '%s' has no description", tool.Name))
		}
		if tool.InputSchema == nil {
			return nil, fmt.Errorf("tool '%s' must have an inputSchema", tool.Name)
		}
	}

	// Validate OAuth configuration if present
	if r.Spec.OAuth != nil {
		if r.Spec.OAuth.ClientID == "" {
			return nil, fmt.Errorf("oauth.clientId is required when OAuth is configured")
		}
		if !urlRegex.MatchString(r.Spec.OAuth.AuthorizationURL) {
			return nil, fmt.Errorf("oauth.authorizationUrl must be a valid HTTP(S) URL")
		}
		if !urlRegex.MatchString(r.Spec.OAuth.TokenURL) {
			return nil, fmt.Errorf("oauth.tokenUrl must be a valid HTTP(S) URL")
		}
		if r.Spec.OAuth.ClientSecretRef.Name == "" || r.Spec.OAuth.ClientSecretRef.Key == "" {
			return nil, fmt.Errorf("oauth.clientSecretRef must specify both name and key")
		}
	}

	// Validate widgets
	for _, widget := range r.Spec.Widgets {
		if widget.Name == "" {
			return nil, fmt.Errorf("all widgets must have a name")
		}
		if widget.Template == "" {
			return nil, fmt.Errorf("widget '%s' must have a template", widget.Name)
		}
		if widget.MaxTokens > 4000 {
			return nil, fmt.Errorf("widget '%s' maxTokens exceeds OpenAI limit of 4000", widget.Name)
		}
	}

	// Validate resource requirements
	if r.Spec.Resources != nil {
		if err := validateResourceQuantity(r.Spec.Resources.Requests.CPU, "requests.cpu"); err != nil {
			return nil, err
		}
		if err := validateResourceQuantity(r.Spec.Resources.Requests.Memory, "requests.memory"); err != nil {
			return nil, err
		}
		if err := validateResourceQuantity(r.Spec.Resources.Limits.CPU, "limits.cpu"); err != nil {
			return nil, err
		}
		if err := validateResourceQuantity(r.Spec.Resources.Limits.Memory, "limits.memory"); err != nil {
			return nil, err
		}
	}

	return warnings, nil
}

// validateResourceQuantity validates Kubernetes resource quantities
func validateResourceQuantity(quantity, field string) error {
	validPattern := regexp.MustCompile(`^[0-9]+(\.[0-9]+)?(m|Mi|Gi)?$`)
	if !validPattern.MatchString(quantity) {
		return fmt.Errorf("%s must be a valid Kubernetes quantity (e.g., '500m', '1Gi')", field)
	}
	return nil
}

Webhooks ensure only valid ChatGPT apps enter the cluster, preventing configuration errors before they cause runtime failures.

RBAC Configuration for Multi-Tenancy

Secure multi-tenant ChatGPT operators require granular RBAC:

# config/rbac/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: chatgpt-operator-manager-role
rules:
# ChatGPTApp CRD permissions
- apiGroups:
  - apps.makeaihq.com
  resources:
  - chatgptapps
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps.makeaihq.com
  resources:
  - chatgptapps/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - apps.makeaihq.com
  resources:
  - chatgptapps/finalizers
  verbs:
  - update

# Deployment management
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch

# Service management
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch

# Secret read access (for OAuth credentials)
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
  - list
  - watch

# ConfigMap management (for widget templates)
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch

# Event creation for audit trail
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - patch

# Ingress management (for public endpoints)
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: chatgpt-operator-metrics-reader
rules:
- nonResourceURLs:
  - /metrics
  verbs:
  - get

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: chatgpt-operator-proxy-role
rules:
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: chatgpt-operator-manager-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: chatgpt-operator-manager-role
subjects:
- kind: ServiceAccount
  name: chatgpt-operator-controller-manager
  namespace: chatgpt-operator-system

---
# Namespace-scoped role for tenant isolation
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: chatgpt-app-viewer
  namespace: tenant-namespace
rules:
- apiGroups:
  - apps.makeaihq.com
  resources:
  - chatgptapps
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apps.makeaihq.com
  resources:
  - chatgptapps/status
  verbs:
  - get

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: chatgpt-app-viewer-binding
  namespace: tenant-namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: chatgpt-app-viewer
subjects:
- kind: ServiceAccount
  name: tenant-user
  namespace: tenant-namespace

This RBAC configuration isolates tenants while allowing the operator full cluster access for ChatGPT app management. Learn more about Kubernetes security patterns.

Production Operator Deployment

Deploying operators to production requires Operator Lifecycle Manager (OLM) integration, versioning strategies, and comprehensive monitoring.

OLM Bundle Configuration

Create an OLM bundle for operator distribution:

# bundle/manifests/chatgpt-operator.clusterserviceversion.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
  name: chatgpt-operator.v1.0.0
  namespace: placeholder
  annotations:
    alm-examples: |-
      [
        {
          "apiVersion": "apps.makeaihq.com/v1alpha1",
          "kind": "ChatGPTApp",
          "metadata": {
            "name": "fitness-assistant"
          },
          "spec": {
            "displayName": "Fitness Studio Assistant",
            "description": "AI-powered class scheduling and nutrition advice",
            "replicas": 2,
            "mcpServer": {
              "image": "makeaihq/mcp-fitness:1.0.0",
              "port": 8080,
              "tools": [
                {
                  "name": "schedule_class",
                  "description": "Schedule a fitness class",
                  "inputSchema": {
                    "type": "object",
                    "properties": {
                      "className": {"type": "string"},
                      "date": {"type": "string", "format": "date-time"}
                    }
                  }
                }
              ]
            }
          }
        }
      ]
    capabilities: Deep Insights
    categories: AI/ML,Developer Tools
    containerImage: makeaihq/chatgpt-operator:1.0.0
    description: Kubernetes operator for managing ChatGPT applications
    repository: https://github.com/makeaihq/chatgpt-operator
spec:
  displayName: ChatGPT Operator
  description: |
    The ChatGPT Operator automates deployment and lifecycle management of ChatGPT applications on Kubernetes.

    Features:
    - Declarative ChatGPT app configuration via CRDs
    - Automated MCP server deployment and scaling
    - OAuth credential management
    - Widget template rendering
    - Multi-tenant isolation
    - Health monitoring and auto-healing

  version: 1.0.0
  maturity: stable
  maintainers:
  - name: MakeAIHQ Engineering
    email: engineering@makeaihq.com
  provider:
    name: MakeAIHQ
    url: https://makeaihq.com
  keywords:
  - chatgpt
  - openai
  - ai
  - mcp
  - operator
  links:
  - name: Documentation
    url: https://docs.makeaihq.com/operator
  - name: GitHub
    url: https://github.com/makeaihq/chatgpt-operator
  icon:
  - base64data: iVBORw0KGgoAAAANSUhEUg... # Base64 icon
    mediatype: image/png
  minKubeVersion: 1.24.0

  installModes:
  - type: OwnNamespace
    supported: true
  - type: SingleNamespace
    supported: true
  - type: MultiNamespace
    supported: true
  - type: AllNamespaces
    supported: true

  customresourcedefinitions:
    owned:
    - name: chatgptapps.apps.makeaihq.com
      version: v1alpha1
      kind: ChatGPTApp
      displayName: ChatGPT App
      description: Represents a ChatGPT application with MCP server and widgets
      statusDescriptors:
      - path: phase
        displayName: Phase
        description: Current deployment phase
        x-descriptors:
        - urn:alm:descriptor:io.kubernetes.phase
      - path: readyReplicas
        displayName: Ready Replicas
        description: Number of healthy MCP server instances
        x-descriptors:
        - urn:alm:descriptor:com.tectonic.ui:podCount
      - path: endpoint
        displayName: Endpoint
        description: MCP server endpoint URL
        x-descriptors:
        - urn:alm:descriptor:org.w3:link

  install:
    strategy: deployment
    spec:
      clusterPermissions:
      - serviceAccountName: chatgpt-operator-controller-manager
        rules:
        - apiGroups: ["apps.makeaihq.com"]
          resources: ["chatgptapps", "chatgptapps/status", "chatgptapps/finalizers"]
          verbs: ["*"]
        - apiGroups: ["apps"]
          resources: ["deployments"]
          verbs: ["*"]
        - apiGroups: [""]
          resources: ["services", "secrets", "configmaps", "events"]
          verbs: ["*"]

      deployments:
      - name: chatgpt-operator-controller-manager
        spec:
          replicas: 1
          selector:
            matchLabels:
              control-plane: controller-manager
          template:
            metadata:
              labels:
                control-plane: controller-manager
            spec:
              serviceAccountName: chatgpt-operator-controller-manager
              containers:
              - name: manager
                image: makeaihq/chatgpt-operator:1.0.0
                command:
                - /manager
                args:
                - --leader-elect
                - --health-probe-bind-address=:8081
                - --metrics-bind-address=:8080
                env:
                - name: ENABLE_WEBHOOKS
                  value: "true"
                resources:
                  limits:
                    cpu: 500m
                    memory: 512Mi
                  requests:
                    cpu: 100m
                    memory: 128Mi
                livenessProbe:
                  httpGet:
                    path: /healthz
                    port: 8081
                  initialDelaySeconds: 15
                  periodSeconds: 20
                readinessProbe:
                  httpGet:
                    path: /readyz
                    port: 8081
                  initialDelaySeconds: 5
                  periodSeconds: 10

Deploy with OLM:

# Build and push bundle
make bundle IMG=makeaihq/chatgpt-operator:1.0.0
make bundle-build bundle-push BUNDLE_IMG=makeaihq/chatgpt-operator-bundle:1.0.0

# Install with OLM
operator-sdk run bundle makeaihq/chatgpt-operator-bundle:1.0.0

Monitoring and Observability

Integrate Prometheus metrics:

// controllers/metrics.go
package controllers

import (
	"github.com/prometheus/client_golang/prometheus"
	"sigs.k8s.io/controller-runtime/pkg/metrics"
)

var (
	chatGPTAppsTotal = prometheus.NewGaugeVec(
		prometheus.GaugeOpts{
			Name: "chatgpt_apps_total",
			Help: "Total number of ChatGPT apps by phase",
		},
		[]string{"phase", "namespace"},
	)

	reconcileCount = prometheus.NewCounterVec(
		prometheus.CounterOpts{
			Name: "chatgpt_app_reconcile_total",
			Help: "Total number of reconciliations",
		},
		[]string{"status"},
	)

	reconcileDuration = prometheus.NewHistogramVec(
		prometheus.HistogramOpts{
			Name:    "chatgpt_app_reconcile_duration_seconds",
			Help:    "Duration of reconciliation in seconds",
			Buckets: prometheus.DefBuckets,
		},
		[]string{"status"},
	)
)

func init() {
	metrics.Registry.MustRegister(chatGPTAppsTotal, reconcileCount, reconcileDuration)
}

Create Grafana dashboard:

{
  "dashboard": {
    "title": "ChatGPT Operator Metrics",
    "panels": [
      {
        "title": "ChatGPT Apps by Phase",
        "targets": [
          {
            "expr": "chatgpt_apps_total",
            "legendFormat": "{{phase}} ({{namespace}})"
          }
        ],
        "type": "graph"
      },
      {
        "title": "Reconciliation Rate",
        "targets": [
          {
            "expr": "rate(chatgpt_app_reconcile_total[5m])",
            "legendFormat": "{{status}}"
          }
        ],
        "type": "graph"
      }
    ]
  }
}

Production operators require comprehensive monitoring to detect reconciliation failures, resource exhaustion, and performance degradation. Explore Kubernetes monitoring strategies and operator best practices.

Conclusion: Self-Healing ChatGPT Infrastructure

Kubernetes operators transform ChatGPT app deployment from manual, error-prone processes into declarative, self-healing infrastructure that scales automatically. By encoding operational expertise as code, operators eliminate toil, reduce outages, and enable platform teams to manage hundreds of ChatGPT applications with minimal manual intervention.

The operator pattern provides:

  • Declarative Configuration: Define ChatGPT apps as CRDs, version control them with Git, and apply changes with kubectl
  • Automated Lifecycle Management: Controllers handle deployment, scaling, upgrades, and cleanup without human operators
  • Self-Healing Capabilities: Reconciliation loops detect and correct configuration drift, automatically recovering from failures
  • Multi-Tenancy and Security: RBAC and namespace isolation protect ChatGPT apps across organizational boundaries
  • Production-Ready Operations: OLM integration, monitoring, and webhooks ensure operators meet enterprise reliability standards

Ready to automate your ChatGPT app infrastructure? Build your first ChatGPT app with MakeAIHQ—our no-code platform includes operator-managed deployments, auto-scaling MCP servers, and built-in OAuth configuration. From zero to production ChatGPT app in 48 hours, with Kubernetes operators handling the complexity.

Start automating your AI infrastructure today with operator-driven ChatGPT app management.


Related Resources:

  • Kubernetes Deployment Patterns for ChatGPT Apps
  • MCP Server Deployment Strategies
  • ChatGPT App Architecture Patterns
  • Container Orchestration for AI Applications
  • Kubernetes Security Best Practices
  • Kubernetes Monitoring Strategies
  • ChatGPT App Configuration Management
  • Complete MCP Protocol Guide

External Resources: