Helm Deployment & Infrastructure

Overview

Ominis Cluster Manager uses Helm as its primary infrastructure deployment tool, following a two-tier architecture that separates cluster-wide services from tenant-specific resources. This page documents the complete Helm chart infrastructure, tenant model, deployment process, and the rationale behind choosing Helm over Terraform.

What is Helm?

Helm is the package manager for Kubernetes, providing templating, versioning, and lifecycle management for Kubernetes applications. Think of it as apt/yum for Kubernetes - it packages related Kubernetes resources into versioned charts that can be installed, upgraded, and rolled back as a single unit.

Why Helm for Ominis?

Decision: Use Helm for infrastructure scaffolding (namespace, RBAC, secrets, ingress) and Python Kubernetes client for runtime resources (queues, IVRs).

Rationale:

Kubernetes-Native: Standard tooling recognized across the industry
Simplified Operations: One command to deploy entire tenant infrastructure
Version Control: Track chart versions alongside application versions
Templating: Dynamic configuration via values.yaml
Rollback Support: Native rollback to previous releases
GitOps Ready: Compatible with ArgoCD, Flux, and other GitOps tools

Cluster vs Tenant Infrastructure Model

Two-Tier Infrastructure Approach

Ominis uses a two-tier infrastructure model that separates cluster-wide and tenant-scoped resources. This provides operational efficiency (shared services) while maintaining tenant isolation (dedicated resources).

Cluster Infrastructure (cluster-infra repository)

Deployed: Once per Kubernetes cluster
Shared: By all tenants
Purpose: Foundation services that all tenants depend on

Services:

cert-manager: TLS certificate automation (Let's Encrypt)
Traefik: Ingress controller and reverse proxy
Authentik: SSO and OAuth2 provider
Vaultwarden: Password manager
Homer: SIP traffic monitoring
Excalidraw: Diagramming tool

Namespaces:

cert-manager - Certificate automation
authentik - Identity and access management
vaultwarden - Secrets management
flow-proxy - Traefik ingress
homer - SIP monitoring
excalidraw - Documentation diagrams

Tenant Infrastructure (this document)

Deployed: Once per tenant/customer
Isolated: In dedicated namespaces
Purpose: Customer-specific application services

Services:

API Service: FastAPI REST API for call control
PostgreSQL: Database for configuration and state
FreeSWITCH Registrar: SIP registration server and B2BUA
Queue Pods: Dynamic FreeSWITCH instances per queue (runtime)
IVR Pods: Dynamic FreeSWITCH instances per IVR (runtime)

Namespace Pattern: client-{tenant-name} (e.g., client-demo-client)

Comparison Table

Aspect	Cluster Infrastructure	Tenant Infrastructure
Deployment Frequency	Once per cluster	Once per tenant
Sharing	Shared by all tenants	Isolated per tenant
Examples	Cert-manager, Traefik	API, PostgreSQL, Queues
Helm Chart Location	`/cluster-infra/helm-charts/`	`/cluster-manager/charts/tenant-infra/`
Update Impact	Affects all tenants	Affects single tenant
Resource Efficiency	High (shared)	Lower (isolated)
Fault Isolation	Lower	High
Customization	Minimal	Per-tenant

Dependency Flow

Why This Model?

Benefits:

✅ Cost Efficiency: One cert-manager instead of N cert-managers (where N = tenant count)
✅ Operational Simplicity: Upgrade cluster services once, not per-tenant
✅ Tenant Isolation: Tenant failures don't cascade to other tenants
✅ Security: Network policies enforce namespace boundaries
✅ Scalability: Independent tenant scaling without affecting others

Trade-offs:

❌ Shared Failure Domain: Cert-manager downtime affects all tenants
❌ Version Lock: Cluster services uniform across tenants (no per-tenant versions)
⚠️ Operational Complexity: Two-tier management requires coordination

Integration Points

The tenant infrastructure integrates with cluster infrastructure at several key points:

TLS Certificates: Tenant ingress annotations reference letsencrypt-prod ClusterIssuer from cert-manager
Ingress Routing: Tenant Ingress resources use ingressClassName: traefik
SIP Monitoring: Homer DaemonSet captures traffic from tenant queue pods
Authentication (future): Tenant APIs can delegate authentication to Authentik

Helm Chart Structure

The tenant-infra chart contains all resources needed to deploy a complete tenant environment.

Chart Directory Layout

charts/tenant-infra/
├── Chart.yaml                   # Chart metadata (version, name, description)
├── values.yaml                  # Default configuration values
├── values.schema.json           # Value validation schema
├── init-schema.sql              # PostgreSQL database initialization
├── templates/                   # Kubernetes resource templates
│   ├── _helpers.tpl             # Template helper functions
│   ├── NOTES.txt                # Post-install instructions
│   ├── namespace.yaml           # Tenant namespace (not used, created via --create-namespace)
│   ├── rbac.yaml                # RBAC resources (roles, bindings)
│   ├── serviceaccount.yaml      # ServiceAccount for API pod
│   ├── configmaps/
│   │   ├── configmap-bootstrap.yaml           # Bootstrap configuration
│   │   ├── configmap-freeswitch-registrar.yaml # Registrar FreeSWITCH config
│   │   ├── configmap-freeswitch-xmlcurl.yaml  # XML-CURL config
│   │   ├── configmap-postgres-init.yaml       # PostgreSQL init SQL
│   │   └── configmap-xmlrpc.yaml              # XML-RPC credentials
│   ├── secrets/
│   │   ├── secret-bootstrap.yaml              # API key and DB password
│   │   ├── secret-freeswitch-xmlcurl.yaml     # XML-CURL token
│   │   ├── secret-postgres.yaml               # PostgreSQL credentials
│   │   ├── secret-xmlrpc.yaml                 # XML-RPC credentials
│   │   └── secret-n8n.yaml                    # n8n configuration (optional)
│   ├── deployments/
│   │   ├── deployment-api.yaml                # API FastAPI deployment
│   │   ├── deployment-postgres.yaml           # PostgreSQL deployment
│   │   └── deployment-freeswitch-registrar.yaml # Registrar deployment
│   ├── services/
│   │   ├── service-api.yaml                   # API service
│   │   ├── service-postgres.yaml              # PostgreSQL service
│   │   ├── service-freeswitch-registrar.yaml  # Registrar service
│   │   └── service-n8n.yaml                   # n8n service (optional)
│   ├── pvcs/
│   │   ├── pvc-postgres.yaml                  # PostgreSQL persistent storage
│   │   ├── pvc-ivr-audio-cache.yaml           # IVR TTS audio cache (optional)
│   │   └── pvc-n8n.yaml                       # n8n data storage (optional)
│   ├── ingress-api.yaml         # API ingress (HTTPS)
│   └── imagepullsecret.yaml     # GitHub Container Registry credentials
└── README.md                    # Chart documentation

Key Files

Chart.yaml

apiVersion: v2
name: tenant-infra
description: Tenant infrastructure scaffolding for Ominis Cluster Manager
type: application
version: 1.0.0
appVersion: "1.0.0"
maintainers:
  - name: Ominis AI
    email: admin@ominis.ai

values.yaml

The values file defines all configurable parameters. Key sections:

tenant: Tenant name, labels, annotations
api: API deployment configuration (replicas, image, resources)
postgres: Database configuration (persistence, resources)
freeswitch.registrar: Registrar deployment (replicas, resources, networking)
xmlrpc: XML-RPC URLs and credentials
openai: OpenAI API key for IVR TTS
rbac: RBAC configuration (ServiceAccount, Role, RoleBinding)

Tenant Infrastructure Components

Resource Hierarchy

1. API Service (FastAPI Deployment)

Purpose: REST API gateway for all call control operations

Deployment Configuration:

Replicas: Configurable (default: 1)
Image: ghcr.io/ominis-ai/cm-api:latest
Strategy: Recreate (to avoid dual-write conflicts)
Resources:
- Requests: 256Mi memory, 250m CPU
- Limits: 512Mi memory, 500m CPU

Key Environment Variables:

- DEPLOYMENT_MODE: kubernetes
- KUBERNETES_NAMESPACE: client-demo-client
- API_KEY: (from secret)
- DB_DSN: postgresql+asyncpg://user:pass@postgres:5432/callcenter
- FS_XMLRPC_URL: http://freeswitch-registrar:8080/RPC2
- OPENAI_API_KEY: (from values)

Health Checks:

Liveness Probe: GET /health every 10s
Readiness Probe: GET /health every 5s

Service:

Type: ClusterIP
Port: 8080
Target Port: 8000

2. PostgreSQL (Database Deployment)

Purpose: Centralized configuration and state storage

Deployment Configuration:

Replicas: 1 (not HA by default)
Image: postgres:15
Strategy: Recreate (ensures single writer)
Resources:
- Requests: 256Mi memory, 250m CPU
- Limits: 512Mi memory, 500m CPU

Persistent Storage:

PVC Size: 10Gi (configurable)
Storage Class: Default (or specify custom)
Mount Path: /var/lib/postgresql/data

Initialization:

SQL script mounted via ConfigMap (init-schema.sql)
Creates tables: queues, cc_agents, cc_tiers, cc_members, extensions, ivrs, ivr_menus, ivr_menu_options, ivr_tts_cache

ODBC Configuration:

DSN: callcenter
Used by FreeSWITCH pods for direct database access

Health Checks:

Liveness Probe: pg_isready every 10s
Readiness Probe: pg_isready every 5s

Service:

Type: ClusterIP
Port: 5432
DNS: postgres.client-demo-client.svc.cluster.local

3. FreeSWITCH Registrar (SIP Registration & B2BUA)

Purpose: SIP registration server and media anchor

Deployment Configuration:

Replicas: 1
Image: ghcr.io/ominis-ai/freeswitch-registrar:latest
Strategy: Recreate (to prevent dual SIP binding)
Resources:
- Requests: 512Mi memory, 250m CPU
- Limits: 1Gi memory, 500m CPU

Networking:

Host Network: Enabled (temporary workaround for IPv6 issues)
Node Selector: kubernetes.io/hostname: c2-30-bhs5 (pinned to specific node)
DNS Policy: ClusterFirstWithHostNet

Ports:

5060/UDP - SIP signaling
5061/TCP - SIP TLS
8080/TCP - XML-RPC
20000-30000/UDP - RTP media

Configuration:

Mounted via ConfigMap (freeswitch-registrar-config)
Includes: freeswitch.xml, vars.xml, sofia.conf.xml, acl.conf.xml

B2BUA Pattern:

Queue→Registrar: Internal IP (10.x.x.x)
Registrar→Agent: Public IP (51.79.31.20)
Anchors media between internal pods and external SIP clients

Service:

Type: ClusterIP
Ports: 5060/UDP, 8080/TCP
DNS: freeswitch-registrar.client-demo-client.svc.cluster.local

4. ConfigMaps & Secrets

ConfigMaps:

bootstrap - Basic tenant configuration

registry_url: ghcr.io/ominis-ai
deployment_mode: kubernetes
```bash

freeswitch-registrar-config - Complete FreeSWITCH configuration files
- freeswitch.xml - Main configuration
- vars.xml - Variables
- sofia.conf.xml - SIP profiles
- acl.conf.xml - ACL rules
- modules.conf.xml - Module loading
postgres-init-schema - Database initialization SQL
- Loaded via init-schema.sql file
xmlrpc - XML-RPC connection details
- URLs for registrar and campaign pods
- Used by API for XML-RPC requests

Secrets:

bootstrap - Core credentials

api_key: (API key for authentication)
db_password: (PostgreSQL password)

2. **postgres-credentials** - Database credentials
   ```yaml
   DB_PASS: (PostgreSQL password)
   ```bash
3. **freeswitch-xmlrpc** - XML-RPC credentials
   ```yaml
   username: fsadmin
   password: (XML-RPC password)

freeswitch-xmlcurl-token - XML-CURL shared secret
```
token: (Shared secret for mod_xml_curl)
```

5. Ingress (HTTPS Routing)

Purpose: External HTTPS access to API

Configuration:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
  ingressClassName: traefik
  rules:
  - host: api.demo.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api
            port:
              number: 8080
  tls:
  - secretName: api-tls-cert
    hosts:
    - api.demo.example.com

Features:

TLS Automation: cert-manager issues Let's Encrypt certificates
Ingress Controller: Traefik routes traffic to API service
HTTPS Redirect: Automatic HTTP→HTTPS redirect
Custom Domain: Configure via api.host in values.yaml

6. RBAC (ServiceAccount, Role, RoleBinding)

Purpose: Grant API pod permission to manage runtime resources (queues, IVRs)

ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: client-demo-client
  namespace: client-demo-client

Role:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: runtime-manager
  namespace: client-demo-client
rules:
- apiGroups: ["", "apps"]
  resources: ["deployments", "services", "configmaps", "pods", "pods/log"]
  verbs: ["get", "list", "create", "update", "patch", "delete", "watch"]

RoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: runtime-manager-binding
  namespace: client-demo-client
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: runtime-manager
subjects:
- kind: ServiceAccount
  name: client-demo-client
  namespace: client-demo-client

Why This Matters:

Without proper RBAC, the API pod cannot create queue/IVR pods dynamically. The runtime-manager role grants the API pod permission to orchestrate Kubernetes resources within its own namespace.

ADR-0003: Helm Over Terraform

Context

Ominis Cluster Manager originally used Terraform (OpenTofu) for infrastructure deployment. The workflow was:

API generates .tf files with queue configuration
Git-sync sidecar detects changes and pulls .tf files
Terraform init → plan → apply creates Kubernetes resources
.tfstate file tracks infrastructure state

Problem: This workflow was complex, slow, and introduced state synchronization issues.

Decision

Migrate to Helm for infrastructure scaffolding (namespace, RBAC, secrets) and Python Kubernetes client for runtime resources (queues, IVRs).

Alternatives Considered

1. Terraform (Previous Approach)

❌ Rejected for:

Complexity: Generating .tf files dynamically added indirection
State Drift: Terraform state could drift from actual cluster state
Git-sync Dependency: Required sidecar container and SSH keys for git access
Slow Feedback: Push to git → wait for sync → wait for apply (15-30 seconds)
Port Allocation: Cluster-level tracking was unnecessary with registrar pattern

✅ Advantages:

Multi-cloud abstraction (can deploy to AWS, GCP, Azure)
Strong state management with .tfstate files
Mature ecosystem with many providers

2. Helm + Python Kubernetes Client (Chosen)

✅ Benefits:

Simplicity: Standard Kubernetes tooling, no custom state management
Speed: Direct API → Kubernetes, no git intermediary (1-3 seconds)
Idempotency: Server-side apply natively handles conflicts
Observability: Kubernetes events and status directly available
Standard: Industry-standard Helm for infra scaffolding
Native: Kubernetes-native workflow (kubectl, helm)

⚠️ Trade-offs:

Loss of multi-cloud abstraction (Kubernetes-only)
No centralized .tfstate equivalent (Kubernetes is the source of truth)

3. Kubernetes Operators

❌ Rejected for:

Overkill: Writing a custom operator is a large undertaking
Operational Burden: Another service to maintain
Complexity: Reconciliation loops and CRDs add complexity

✅ When to Consider:

If you need continuous reconciliation (drift correction)
If you're building a product around the operator (e.g., Strimzi for Kafka)

4. GitOps (ArgoCD / Flux)

⚠️ Future Consideration:

Works well with Helm charts
Provides continuous sync from git → cluster
Adds operational complexity (another service)
Good for multi-tenant SaaS with strict audit requirements

✅ Compatible with Current Approach:

Can deploy Helm chart via ArgoCD
Not needed for initial MVP

Consequences

Positive:

✅ Simplicity: Reduced from 5 components (API, Terraform, git-sync, git repo, .tfstate) to 2 (Helm, API)
✅ Speed: Queue creation: 15-30s (Terraform) → 3-5s (Kubernetes client)
✅ Reliability: No state drift (Kubernetes is source of truth)
✅ Developer Experience: Standard helm and kubectl commands
✅ Observability: Native Kubernetes events and status

Negative:

❌ Kubernetes-Only: Cannot deploy to other platforms (acceptable for Ominis use case)
❌ No Centralized State: Relies on Kubernetes API as source of truth (acceptable for cloud-native apps)

Migration Impact:

✅ Existing queues continue to work (Kubernetes resources unchanged)
✅ API code updated to use Python Kubernetes client instead of Terraform
✅ Git-sync sidecar removed
✅ .tfstate files archived (no longer needed)

Status: ✅ Accepted - Migrated in October 2025

Date: 2025-10-04

Deployment Process

Prerequisites

Before deploying the Helm chart, ensure you have:

Kubernetes Cluster: 1.19+ (tested on 1.27+)
Helm: 3.8+ installed locally
kubectl: Configured with cluster access
Docker Registry Access: GitHub Container Registry (GHCR) credentials
DNS: Domain name for API ingress (e.g., api.demo.example.com)

Step 1: Prepare Values File

Create a values.yaml file with tenant-specific configuration:

tenant:
  name: demo-client
  labels:
    environment: production
    owner: platform-team

api:
  enabled: true
  replicas: 1
  image: ghcr.io/ominis-ai/cm-api
  tag: latest
  api_key: "your-secret-api-key-here"
  host: api.demo.example.com
  ingressClass: traefik
  tlsEnabled: true
  tlsSecretName: ""  # Cert-manager will create this
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod

postgres:
  enabled: true
  persistence:
    enabled: true
    size: 10Gi
    storageClass: ""  # Use default storage class

freeswitch:
  registrar:
    enabled: true
    replicas: 1

imagePullSecrets:
  enabled: true
  name: ghcr-secret
  username: your-github-username
  password: ghp_your_github_token
  email: github@example.com
  server: ghcr.io

openai:
  apiKey: "sk-..."  # OpenAI API key for IVR TTS

rbac:
  enabled: true
  serviceAccount:
    create: true

Step 2: Preview Changes

Before applying, preview what will be deployed:

cd /home/matt/projects/fml/cluster-manager

# Lint the chart
helm lint charts/tenant-infra -f values.yaml

# Dry-run to see rendered templates
helm upgrade --install demo-client charts/tenant-infra \
  -f values.yaml \
  --namespace client-demo-client \
  --create-namespace \
  --dry-run --debug

What to Look For:

✅ Namespace is client-demo-client
✅ All secrets have values (no empty strings)
✅ Ingress host is correct
✅ Image pull secret is configured

Step 3: Deploy Chart

Deploy the chart using the Makefile:

# Deploy using Makefile (recommended)
make helm-apply TENANT=demo-client VALUES=values.yaml

# Or deploy directly with helm
helm upgrade --install demo-client charts/tenant-infra \
  -f values.yaml \
  --namespace client-demo-client \
  --create-namespace \
  --atomic \
  --wait \
  --timeout 10m

Flags Explained:

--install: Install if release doesn't exist
--atomic: Rollback on failure
--wait: Wait for pods to be ready
--timeout 10m: Max time to wait
--create-namespace: Create namespace if it doesn't exist

Step 4: Verify Deployment

Check that all resources are created and healthy:

# Check Helm release
helm list -n client-demo-client

# Check all resources
kubectl get all,ing,cm,secret,sa,role,rolebinding -n client-demo-client

# Check pod status
kubectl get pods -n client-demo-client -w

# Check pod logs
kubectl logs -l app=api -n client-demo-client --tail=50
kubectl logs -l app=postgres -n client-demo-client --tail=50
kubectl logs -l app=freeswitch-registrar -n client-demo-client --tail=50

Expected Output:

NAME                                     READY   STATUS    RESTARTS   AGE
pod/api-78c5d4b7ff-xq9wz                1/1     Running   0          2m
pod/postgres-6c8b5d4f7-h8k2p            1/1     Running   0          2m
pod/freeswitch-registrar-5d7c8f-9x4tz   1/1     Running   0          2m

NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/api        ClusterIP   10.43.123.45    <none>        8080/TCP   2m
service/postgres   ClusterIP   10.43.123.46    <none>        5432/TCP   2m
service/freeswitch-registrar ClusterIP 10.43.123.47 <none>  5060/UDP,8080/TCP 2m

Step 5: Test API Access

Test that the API is accessible:

# Via ingress (if configured)
curl https://api.demo.example.com/health

# Via port-forward (for testing)
kubectl port-forward svc/api 8080:8080 -n client-demo-client
curl http://localhost:8080/health

Expected Response:

{
  "status": "healthy",
  "service": "callcenter-api"
}

Deployment Flow Diagram

This sequence diagram shows the complete deployment flow:

Values Configuration

The values.yaml file controls all aspects of the tenant deployment. Here are the key sections:

Tenant Configuration

tenant:
  name: demo-client                # Tenant identifier (used in namespace)
  labels:                          # Custom labels for all resources
    environment: production
    owner: platform-team
    cost-center: engineering
  annotations:                     # Custom annotations
    contact: admin@demo.example.com

API Configuration

api:
  enabled: true                    # Enable/disable API deployment
  replicas: 1                      # Number of API pods
  image: ghcr.io/ominis-ai/cm-api
  tag: latest                      # Image tag (use specific versions in production)
  imagePullPolicy: Always          # Pull policy (Always, IfNotPresent, Never)
  debug: "true"                    # Enable debug logging
  api_key: "Winnipeg2025"          # API key for authentication
  strategy:
    type: Recreate                 # Deployment strategy (Recreate or RollingUpdate)
  resources:
    requests:
      memory: "256Mi"
      cpu: "250m"
    limits:
      memory: "512Mi"
      cpu: "500m"
  # Ingress configuration
  host: api.demo.example.com       # Domain name for API
  ingressClass: traefik            # Ingress controller
  servicePort: 8080                # Service port
  tlsEnabled: false                # Enable TLS (requires cert-manager)
  tlsSecretName: ""                # TLS secret name (auto-generated if empty)
  annotations:                     # Ingress annotations
    cert-manager.io/cluster-issuer: letsencrypt-prod

PostgreSQL Configuration

postgres:
  enabled: true                    # Enable/disable PostgreSQL deployment
  image: postgres
  tag: "15"                        # PostgreSQL version
  imagePullPolicy: IfNotPresent
  user: callcenter_user            # Database user
  database: callcenter             # Database name
  password: "callcenter_pass_demo-client"  # Database password
  persistence:
    enabled: true                  # Enable persistent storage
    size: 10Gi                     # PVC size
    storageClass: ""               # Storage class (empty = default)
    existingClaim: ""              # Use existing PVC (optional)
  resources:
    requests:
      memory: "256Mi"
      cpu: "250m"
    limits:
      memory: "512Mi"
      cpu: "500m"
  odbc:
    dsn: "callcenter"              # ODBC DSN name for FreeSWITCH

FreeSWITCH Configuration

freeswitch:
  xmlcurl:
    enabled: true                  # Enable mod_xml_curl
    token: "Winnipeg2025-xmlcurl-changeme"  # Shared secret token
  
  registrar:
    enabled: true                  # Enable registrar deployment
    replicas: 1
    image: ghcr.io/ominis-ai/freeswitch-registrar
    tag: latest
    imagePullPolicy: Always
    hostNetwork: true              # Use host networking (IPv6 workaround)
    nodeSelector:
      kubernetes.io/hostname: c2-30-bhs5  # Pin to specific node
    resources:
      requests:
        memory: "512Mi"
        cpu: "250m"
      limits:
        memory: "1Gi"
        cpu: "500m"

XML-RPC Configuration

xmlrpc:
  enabled: true
  port: 8080
  url: "http://freeswitch-registrar.demo-client.svc.cluster.local:8080/RPC2"
  username: "fsadmin"              # XML-RPC username
  password: "Winnipeg2025"         # XML-RPC password
  acl:
    enabled: true                  # Enable ACL
    allowedCidrs:                  # Allowed IP ranges
      - "10.0.0.0/8"
      - "172.16.0.0/12"
      - "192.168.0.0/16"
  campaign:
    url: "http://freeswitch-campaign.demo-client.svc.cluster.local:8080/RPC2"
    username: "fsadmin"
    password: "Winnipeg2025"

OpenAI Configuration (for IVR TTS)

openai:
  apiKey: "sk-..."                 # OpenAI API key
  ttsModel: "tts-1"                # TTS model (tts-1, tts-1-hd)
  defaultVoice: "alloy"            # Default voice (alloy, echo, fable, onyx, nova, shimmer)

RBAC Configuration

rbac:
  enabled: true                    # Enable RBAC resources
  serviceAccount:
    create: true                   # Create ServiceAccount
    name: ""                       # ServiceAccount name (auto-generated if empty)
    annotations: {}                # ServiceAccount annotations

Image Pull Secrets

imagePullSecrets:
  enabled: true                    # Enable image pull secret
  name: ghcr-secret                # Secret name
  username: mattjoubert            # GitHub username
  password: ghp_...                # GitHub token (with packages:read scope)
  email: github@mattjoubert.com    # Email address
  server: ghcr.io                  # Registry server

Multi-Tenant Isolation

Ominis Cluster Manager achieves strong multi-tenant isolation through several Kubernetes mechanisms:

1. Kubernetes Namespaces

Pattern: One namespace per tenant (client-{tenant-name})

Benefits:

✅ Logical separation of resources
✅ RBAC boundaries (ServiceAccount permissions scoped to namespace)
✅ Resource quotas per tenant
✅ Network policies per tenant
✅ Clear ownership (all resources in namespace belong to tenant)

Example:

# Tenant A
kubectl get all -n client-demo-client

# Tenant B
kubectl get all -n client-acme-corp

2. Network Policies

Purpose: Restrict network traffic between tenants

Example Policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tenant-isolation
  namespace: client-demo-client
spec:
  podSelector: {}                   # Apply to all pods in namespace
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          tenant: demo-client       # Only from same tenant
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          tenant: demo-client       # Only to same tenant

3. Resource Quotas

Purpose: Limit resource consumption per tenant

Example Quota:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-quota
  namespace: client-demo-client
spec:
  hard:
    requests.cpu: "10"              # Max 10 CPU cores requested
    requests.memory: 20Gi           # Max 20Gi memory requested
    limits.cpu: "20"                # Max 20 CPU cores limit
    limits.memory: 40Gi             # Max 40Gi memory limit
    persistentvolumeclaims: "10"    # Max 10 PVCs
    services.loadbalancers: "2"     # Max 2 load balancers

4. Separate Databases

Approach: Each tenant has a dedicated PostgreSQL instance

Benefits:

✅ Complete data isolation (no shared tables)
✅ Independent backups and restores
✅ Per-tenant performance tuning
✅ No noisy neighbor issues
⚠️ Trade-off: Higher resource usage vs. security

5. Isolated Secrets

Approach: Secrets are namespace-scoped

Benefits:

✅ API keys, passwords, tokens isolated per tenant
✅ No cross-tenant secret access
✅ ServiceAccount can only read secrets in its namespace

Example:

# Tenant A secrets
kubectl get secrets -n client-demo-client

# Tenant B secrets (different secrets)
kubectl get secrets -n client-acme-corp

Tenant Isolation Diagram

Deployment Examples

Example 1: Preview Deployment Changes

See what will change without applying:

cd /home/matt/projects/fml/cluster-manager

# Preview changes using Makefile
make helm-diff TENANT=demo-client VALUES=values.yaml

# Or use helm directly
helm template demo-client charts/tenant-infra \
  -f values.yaml \
  --namespace client-demo-client > /tmp/demo-client.yaml

kubectl diff -n client-demo-client -f /tmp/demo-client.yaml || true

Output:

+ apiVersion: v1
+ kind: Namespace
+ metadata:
+   name: client-demo-client
...
+ apiVersion: apps/v1
+ kind: Deployment
+ metadata:
+   name: api
+   namespace: client-demo-client
...

Use Cases:

✅ Verify configuration before applying
✅ Review resource changes during upgrades
✅ CI/CD pull request previews

Example 2: Deploy New Tenant

Deploy a new tenant from scratch:

# Step 1: Create values file
cat > tenant-acme.yaml <<EOF
tenant:
  name: acme-corp

api:
  host: api-acme.example.com
  api_key: "acme-secret-key"

postgres:
  password: "acme-db-password"

imagePullSecrets:
  enabled: true
  username: your-github-username
  password: ghp_your_github_token
EOF

# Step 2: Deploy using Makefile
make helm-apply TENANT=acme-corp VALUES=tenant-acme.yaml

# Step 3: Watch deployment progress
kubectl get pods -n client-acme-corp -w

# Step 4: Verify deployment
helm status acme-corp -n client-acme-corp
kubectl get all,ing -n client-acme-corp

Timeline:

Namespace creation: ~1s
ConfigMaps/Secrets: ~2s
PostgreSQL pod ready: ~10-15s
API pod ready: ~5-10s
Registrar pod ready: ~10-15s
Total: ~30-45 seconds

Example 3: Check Deployment Status

Check the status of a deployed tenant:

# List all Helm releases
helm list -A

# Get release details
helm status demo-client -n client-demo-client

# View deployed values
helm get values demo-client -n client-demo-client

# View rendered manifests
helm get manifest demo-client -n client-demo-client

# Check pod status
kubectl get pods -n client-demo-client

# Check ingress
kubectl get ingress -n client-demo-client
kubectl describe ingress api -n client-demo-client

# Check logs
kubectl logs -l app=api -n client-demo-client --tail=100 -f

Expected Output:

NAME: demo-client
LAST DEPLOYED: Mon Oct 14 10:00:00 2025
NAMESPACE: client-demo-client
STATUS: deployed
REVISION: 1

Example 4: Customize Values Per Environment

Use different values for dev, staging, and production:

# Development environment
cat > values-dev.yaml <<EOF
tenant:
  name: demo-client
  labels:
    environment: dev

api:
  replicas: 1
  host: api-dev.demo.example.com
  tlsEnabled: false  # No TLS in dev
  debug: "true"

postgres:
  persistence:
    size: 5Gi  # Smaller storage in dev

freeswitch:
  registrar:
    resources:
      limits:
        memory: "512Mi"  # Lower limits in dev
EOF

# Production environment
cat > values-prod.yaml <<EOF
tenant:
  name: demo-client
  labels:
    environment: prod

api:
  replicas: 3  # HA in production
  host: api.demo.example.com
  tlsEnabled: true
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  debug: "false"

postgres:
  persistence:
    size: 50Gi  # Larger storage in prod
    storageClass: "fast-ssd"

freeswitch:
  registrar:
    replicas: 2  # HA registrar
    resources:
      limits:
        memory: "2Gi"
EOF

# Deploy to dev
make helm-apply TENANT=demo-client-dev VALUES=values-dev.yaml

# Deploy to prod
make helm-apply TENANT=demo-client-prod VALUES=values-prod.yaml

Example 5: Rollback Failed Deployment

Rollback to a previous working version:

# List release history
helm history demo-client -n client-demo-client

# Example output:
# REVISION  UPDATED                   STATUS      CHART              DESCRIPTION
# 1         Mon Oct 14 10:00:00 2025  superseded  tenant-infra-1.0.0 Install complete
# 2         Mon Oct 14 11:00:00 2025  superseded  tenant-infra-1.1.0 Upgrade complete
# 3         Mon Oct 14 12:00:00 2025  failed      tenant-infra-1.2.0 Upgrade failed

# Rollback to previous version (revision 2)
helm rollback demo-client -n client-demo-client

# Or rollback to specific revision
helm rollback demo-client 2 -n client-demo-client

# Verify rollback
kubectl get pods -n client-demo-client -w
helm history demo-client -n client-demo-client

Rollback Process:

Helm reverts to previous manifest
Kubernetes applies changes
Pods restart with old configuration
Timeline: ~30-60 seconds

Upgrade Strategy

Helm Chart Upgrades

When upgrading the tenant infrastructure chart:

# Step 1: Pull latest changes
git pull origin main

# Step 2: Review chart changes
git log --oneline charts/tenant-infra/

# Step 3: Preview upgrade
make helm-diff TENANT=demo-client VALUES=values.yaml

# Step 4: Apply upgrade
make helm-apply TENANT=demo-client VALUES=values.yaml

# Step 5: Monitor rollout
kubectl get pods -n client-demo-client -w

Application Upgrades (API Image)

When upgrading the API application:

# Step 1: Build and push new image
make build TAG=v1.2.0
docker push ghcr.io/ominis-ai/cm-api:v1.2.0

# Step 2: Update values.yaml
sed -i 's/tag: latest/tag: v1.2.0/' values.yaml

# Step 3: Apply upgrade
make helm-apply TENANT=demo-client VALUES=values.yaml

# Step 4: Monitor rollout
kubectl rollout status deployment/api -n client-demo-client

Zero-Downtime Upgrades

For zero-downtime upgrades, use RollingUpdate strategy:

api:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1           # Max pods above desired count during update
      maxUnavailable: 0     # Min pods available during update

Process:

New pod starts
New pod becomes ready
Old pod terminates
Result: Always at least 1 pod serving traffic

Context & Rationale

Why One Chart Per Tenant?

Decision: Deploy one Helm release per tenant

Benefits:

✅ Complete Isolation: Each tenant has dedicated resources
✅ Independent Upgrades: Upgrade one tenant without affecting others
✅ Per-Tenant Customization: Different resource limits, features, configurations
✅ Clear Resource Ownership: All resources in namespace belong to tenant
✅ Simplified Multi-Tenancy: Easy to add/remove tenants

Trade-offs:

⚠️ Resource Overhead: Each tenant has dedicated PostgreSQL, API, Registrar
⚠️ Operational Complexity: More releases to manage

Mitigation:

Use resource limits to cap per-tenant usage
Automate tenant deployment via CI/CD
Monitor all tenants centrally with Prometheus

Why StatefulSet for PostgreSQL?

Decision: Use Deployment (not StatefulSet) for PostgreSQL

Note: The chart currently uses Deployment for PostgreSQL, not StatefulSet. This is acceptable for single-replica databases, but StatefulSet is recommended for HA deployments.

StatefulSet Benefits:

✅ Stable pod identity (postgres-0, postgres-1)
✅ Ordered deployment and scaling
✅ Persistent storage guarantees
✅ Graceful scaling and termination
✅ Reliable restarts (same PVC every time)

When to Use StatefulSet:

Multi-replica databases (primary + replicas)
Databases requiring stable network identity
Databases with ordered scaling requirements

Current Approach (Deployment):

✅ Simpler for single-replica databases
✅ Sufficient for non-HA deployments
⚠️ Not suitable for HA PostgreSQL

Why Separate Registrar?

Decision: Deploy one registrar per tenant (not shared)

Benefits:

✅ Tenant Isolation: Registrar failures don't cascade across tenants
✅ Per-Tenant Configuration: Different SIP profiles per tenant
✅ Simplified Debugging: Registrar logs map 1:1 to tenant
✅ Security: Network policies isolate registrar per tenant

Trade-offs:

⚠️ Resource Overhead: Each tenant has dedicated registrar pod
⚠️ Port Management: Each registrar needs dedicated ports (if not using host networking)

Alternative (Shared Registrar):

One registrar for all tenants
❌ Single point of failure
❌ Configuration complexity (multi-tenant SIP profiles)
❌ Security concerns (tenant boundary violations)

Configuration Management Best Practices

Decision: Store configuration in version control

Approach:

values.yaml: Committed to git (without secrets)
Secrets: Stored in external secret manager (e.g., Vault, AWS Secrets Manager)
Environment-Specific Values: Separate files (values-dev.yaml, values-prod.yaml)

Example:

# values.yaml (committed to git)
tenant:
  name: demo-client

api:
  host: api.demo.example.com
  api_key: ""  # Injected via CI/CD

# secrets.env (NOT committed to git)
API_KEY=secret-key-here
DB_PASSWORD=db-password-here
OPENAI_API_KEY=sk-...

CI/CD Integration:

# Inject secrets during deployment
helm upgrade --install demo-client charts/tenant-infra \
  -f values.yaml \
  --set api.api_key=$API_KEY \
  --set postgres.password=$DB_PASSWORD \
  --set openai.apiKey=$OPENAI_API_KEY

System Overview - High-level architecture and components
Cluster Infrastructure - Cluster-wide services (cert-manager, traefik, etc.)
Kubernetes Operations - Operational runbooks for Kubernetes
Database Schema - PostgreSQL tables and relationships
Queue Management - Dynamic queue pod deployment
API Authentication - API key management and security
Ports & Adapters - Hexagonal architecture pattern
Testing Strategy - Helm chart testing approach

Summary

Ominis Cluster Manager uses Helm for infrastructure scaffolding (namespace, RBAC, secrets, ingress) and Python Kubernetes client for runtime resources (queues, IVRs). This provides the best of both worlds:

✅ Helm: Standard, versioned, rollback-friendly infrastructure deployment
✅ Kubernetes Client: Fast, simple, idempotent runtime orchestration
✅ Multi-Tenant: Strong isolation via namespaces, network policies, and resource quotas
✅ Developer Experience: Simple make helm-apply command deploys complete tenant

The migration from Terraform to Helm reduced deployment complexity, improved performance (15-30s → 3-5s for queues), and aligned with Kubernetes-native best practices.

Powered by Ominis.ai - Modern call center infrastructure for cloud-native platforms.

Overview​

What is Helm?​

Why Helm for Ominis?​

Cluster vs Tenant Infrastructure Model​

Two-Tier Infrastructure Approach​

Cluster Infrastructure (cluster-infra repository)​

Tenant Infrastructure (this document)​

Comparison Table​

Dependency Flow​

Why This Model?​

Integration Points​

Helm Chart Structure​

Chart Directory Layout​

Key Files​

Tenant Infrastructure Components​

Resource Hierarchy​

1. API Service (FastAPI Deployment)​

2. PostgreSQL (Database Deployment)​

3. FreeSWITCH Registrar (SIP Registration & B2BUA)​

4. ConfigMaps & Secrets​

5. Ingress (HTTPS Routing)​

6. RBAC (ServiceAccount, Role, RoleBinding)​

ADR-0003: Helm Over Terraform​

Context​

Decision​

Alternatives Considered​

Consequences​

Deployment Process​

Prerequisites​

Step 1: Prepare Values File​

Step 2: Preview Changes​

Step 3: Deploy Chart​

Step 4: Verify Deployment​

Step 5: Test API Access​

Deployment Flow Diagram​

Values Configuration​

Tenant Configuration​

API Configuration​

PostgreSQL Configuration​

FreeSWITCH Configuration​

XML-RPC Configuration​

OpenAI Configuration (for IVR TTS)​

RBAC Configuration​

Image Pull Secrets​

Multi-Tenant Isolation​

1. Kubernetes Namespaces​

2. Network Policies​

3. Resource Quotas​

4. Separate Databases​

5. Isolated Secrets​

Tenant Isolation Diagram​

Deployment Examples​

Example 1: Preview Deployment Changes​

Example 2: Deploy New Tenant​

Example 3: Check Deployment Status​

Example 4: Customize Values Per Environment​

Example 5: Rollback Failed Deployment​

Upgrade Strategy​

Helm Chart Upgrades​

Application Upgrades (API Image)​

Zero-Downtime Upgrades​

Context & Rationale​

Why One Chart Per Tenant?​

Why StatefulSet for PostgreSQL?​

Why Separate Registrar?​

Configuration Management Best Practices​

Links to Related Sections​

Summary​

Overview

What is Helm?

Why Helm for Ominis?

Cluster vs Tenant Infrastructure Model

Two-Tier Infrastructure Approach

Cluster Infrastructure (cluster-infra repository)

Tenant Infrastructure (this document)

Comparison Table

Dependency Flow

Why This Model?

Integration Points

Helm Chart Structure

Chart Directory Layout

Key Files

Tenant Infrastructure Components

Resource Hierarchy

1. API Service (FastAPI Deployment)

2. PostgreSQL (Database Deployment)

3. FreeSWITCH Registrar (SIP Registration & B2BUA)

4. ConfigMaps & Secrets

5. Ingress (HTTPS Routing)

6. RBAC (ServiceAccount, Role, RoleBinding)

ADR-0003: Helm Over Terraform

Context

Decision

Alternatives Considered

Consequences

Deployment Process

Prerequisites

Step 1: Prepare Values File

Step 2: Preview Changes

Step 3: Deploy Chart

Step 4: Verify Deployment

Step 5: Test API Access

Deployment Flow Diagram

Values Configuration

Tenant Configuration

API Configuration

PostgreSQL Configuration

FreeSWITCH Configuration

XML-RPC Configuration

OpenAI Configuration (for IVR TTS)

RBAC Configuration

Image Pull Secrets

Multi-Tenant Isolation

1. Kubernetes Namespaces

2. Network Policies

3. Resource Quotas

4. Separate Databases

5. Isolated Secrets

Tenant Isolation Diagram

Deployment Examples

Example 1: Preview Deployment Changes

Example 2: Deploy New Tenant

Example 3: Check Deployment Status

Example 4: Customize Values Per Environment

Example 5: Rollback Failed Deployment

Upgrade Strategy

Helm Chart Upgrades

Application Upgrades (API Image)

Zero-Downtime Upgrades

Context & Rationale

Why One Chart Per Tenant?

Why StatefulSet for PostgreSQL?

Why Separate Registrar?

Configuration Management Best Practices

Links to Related Sections

Summary