Skip to main content

Helm Deployment & Infrastructure

Overview

Ominis Cluster Manager uses Helm as its primary infrastructure deployment tool, following a two-tier architecture that separates cluster-wide services from tenant-specific resources. This page documents the complete Helm chart infrastructure, tenant model, deployment process, and the rationale behind choosing Helm over Terraform.

What is Helm?

Helm is the package manager for Kubernetes, providing templating, versioning, and lifecycle management for Kubernetes applications. Think of it as apt/yum for Kubernetes - it packages related Kubernetes resources into versioned charts that can be installed, upgraded, and rolled back as a single unit.

Why Helm for Ominis?

Decision: Use Helm for infrastructure scaffolding (namespace, RBAC, secrets, ingress) and Python Kubernetes client for runtime resources (queues, IVRs).

Rationale:

  1. Kubernetes-Native: Standard tooling recognized across the industry
  2. Simplified Operations: One command to deploy entire tenant infrastructure
  3. Version Control: Track chart versions alongside application versions
  4. Templating: Dynamic configuration via values.yaml
  5. Rollback Support: Native rollback to previous releases
  6. GitOps Ready: Compatible with ArgoCD, Flux, and other GitOps tools

Cluster vs Tenant Infrastructure Model

Two-Tier Infrastructure Approach

Ominis uses a two-tier infrastructure model that separates cluster-wide and tenant-scoped resources. This provides operational efficiency (shared services) while maintaining tenant isolation (dedicated resources).

Cluster Infrastructure (cluster-infra repository)

Deployed: Once per Kubernetes cluster
Shared: By all tenants
Purpose: Foundation services that all tenants depend on

Services:

  • cert-manager: TLS certificate automation (Let's Encrypt)
  • Traefik: Ingress controller and reverse proxy
  • Authentik: SSO and OAuth2 provider
  • Vaultwarden: Password manager
  • Homer: SIP traffic monitoring
  • Excalidraw: Diagramming tool

Namespaces:

  • cert-manager - Certificate automation
  • authentik - Identity and access management
  • vaultwarden - Secrets management
  • flow-proxy - Traefik ingress
  • homer - SIP monitoring
  • excalidraw - Documentation diagrams

Tenant Infrastructure (this document)

Deployed: Once per tenant/customer
Isolated: In dedicated namespaces
Purpose: Customer-specific application services

Services:

  • API Service: FastAPI REST API for call control
  • PostgreSQL: Database for configuration and state
  • FreeSWITCH Registrar: SIP registration server and B2BUA
  • Queue Pods: Dynamic FreeSWITCH instances per queue (runtime)
  • IVR Pods: Dynamic FreeSWITCH instances per IVR (runtime)

Namespace Pattern: client-{tenant-name} (e.g., client-demo-client)

Comparison Table

AspectCluster InfrastructureTenant Infrastructure
Deployment FrequencyOnce per clusterOnce per tenant
SharingShared by all tenantsIsolated per tenant
ExamplesCert-manager, TraefikAPI, PostgreSQL, Queues
Helm Chart Location/cluster-infra/helm-charts//cluster-manager/charts/tenant-infra/
Update ImpactAffects all tenantsAffects single tenant
Resource EfficiencyHigh (shared)Lower (isolated)
Fault IsolationLowerHigh
CustomizationMinimalPer-tenant

Dependency Flow

Why This Model?

Benefits:

Cost Efficiency: One cert-manager instead of N cert-managers (where N = tenant count)
Operational Simplicity: Upgrade cluster services once, not per-tenant
Tenant Isolation: Tenant failures don't cascade to other tenants
Security: Network policies enforce namespace boundaries
Scalability: Independent tenant scaling without affecting others

Trade-offs:

Shared Failure Domain: Cert-manager downtime affects all tenants
Version Lock: Cluster services uniform across tenants (no per-tenant versions)
⚠️ Operational Complexity: Two-tier management requires coordination

Integration Points

The tenant infrastructure integrates with cluster infrastructure at several key points:

  1. TLS Certificates: Tenant ingress annotations reference letsencrypt-prod ClusterIssuer from cert-manager
  2. Ingress Routing: Tenant Ingress resources use ingressClassName: traefik
  3. SIP Monitoring: Homer DaemonSet captures traffic from tenant queue pods
  4. Authentication (future): Tenant APIs can delegate authentication to Authentik

Helm Chart Structure

The tenant-infra chart contains all resources needed to deploy a complete tenant environment.

Chart Directory Layout

charts/tenant-infra/
├── Chart.yaml # Chart metadata (version, name, description)
├── values.yaml # Default configuration values
├── values.schema.json # Value validation schema
├── init-schema.sql # PostgreSQL database initialization
├── templates/ # Kubernetes resource templates
│ ├── _helpers.tpl # Template helper functions
│ ├── NOTES.txt # Post-install instructions
│ ├── namespace.yaml # Tenant namespace (not used, created via --create-namespace)
│ ├── rbac.yaml # RBAC resources (roles, bindings)
│ ├── serviceaccount.yaml # ServiceAccount for API pod
│ ├── configmaps/
│ │ ├── configmap-bootstrap.yaml # Bootstrap configuration
│ │ ├── configmap-freeswitch-registrar.yaml # Registrar FreeSWITCH config
│ │ ├── configmap-freeswitch-xmlcurl.yaml # XML-CURL config
│ │ ├── configmap-postgres-init.yaml # PostgreSQL init SQL
│ │ └── configmap-xmlrpc.yaml # XML-RPC credentials
│ ├── secrets/
│ │ ├── secret-bootstrap.yaml # API key and DB password
│ │ ├── secret-freeswitch-xmlcurl.yaml # XML-CURL token
│ │ ├── secret-postgres.yaml # PostgreSQL credentials
│ │ ├── secret-xmlrpc.yaml # XML-RPC credentials
│ │ └── secret-n8n.yaml # n8n configuration (optional)
│ ├── deployments/
│ │ ├── deployment-api.yaml # API FastAPI deployment
│ │ ├── deployment-postgres.yaml # PostgreSQL deployment
│ │ └── deployment-freeswitch-registrar.yaml # Registrar deployment
│ ├── services/
│ │ ├── service-api.yaml # API service
│ │ ├── service-postgres.yaml # PostgreSQL service
│ │ ├── service-freeswitch-registrar.yaml # Registrar service
│ │ └── service-n8n.yaml # n8n service (optional)
│ ├── pvcs/
│ │ ├── pvc-postgres.yaml # PostgreSQL persistent storage
│ │ ├── pvc-ivr-audio-cache.yaml # IVR TTS audio cache (optional)
│ │ └── pvc-n8n.yaml # n8n data storage (optional)
│ ├── ingress-api.yaml # API ingress (HTTPS)
│ └── imagepullsecret.yaml # GitHub Container Registry credentials
└── README.md # Chart documentation

Key Files

Chart.yaml

apiVersion: v2
name: tenant-infra
description: Tenant infrastructure scaffolding for Ominis Cluster Manager
type: application
version: 1.0.0
appVersion: "1.0.0"
maintainers:
- name: Ominis AI
email: admin@ominis.ai

values.yaml

The values file defines all configurable parameters. Key sections:

  • tenant: Tenant name, labels, annotations
  • api: API deployment configuration (replicas, image, resources)
  • postgres: Database configuration (persistence, resources)
  • freeswitch.registrar: Registrar deployment (replicas, resources, networking)
  • xmlrpc: XML-RPC URLs and credentials
  • openai: OpenAI API key for IVR TTS
  • rbac: RBAC configuration (ServiceAccount, Role, RoleBinding)

Tenant Infrastructure Components

Resource Hierarchy

1. API Service (FastAPI Deployment)

Purpose: REST API gateway for all call control operations

Deployment Configuration:

  • Replicas: Configurable (default: 1)
  • Image: ghcr.io/ominis-ai/cm-api:latest
  • Strategy: Recreate (to avoid dual-write conflicts)
  • Resources:
    • Requests: 256Mi memory, 250m CPU
    • Limits: 512Mi memory, 500m CPU

Key Environment Variables:

- DEPLOYMENT_MODE: kubernetes
- KUBERNETES_NAMESPACE: client-demo-client
- API_KEY: (from secret)
- DB_DSN: postgresql+asyncpg://user:pass@postgres:5432/callcenter
- FS_XMLRPC_URL: http://freeswitch-registrar:8080/RPC2
- OPENAI_API_KEY: (from values)

Health Checks:

  • Liveness Probe: GET /health every 10s
  • Readiness Probe: GET /health every 5s

Service:

  • Type: ClusterIP
  • Port: 8080
  • Target Port: 8000

2. PostgreSQL (Database Deployment)

Purpose: Centralized configuration and state storage

Deployment Configuration:

  • Replicas: 1 (not HA by default)
  • Image: postgres:15
  • Strategy: Recreate (ensures single writer)
  • Resources:
    • Requests: 256Mi memory, 250m CPU
    • Limits: 512Mi memory, 500m CPU

Persistent Storage:

  • PVC Size: 10Gi (configurable)
  • Storage Class: Default (or specify custom)
  • Mount Path: /var/lib/postgresql/data

Initialization:

  • SQL script mounted via ConfigMap (init-schema.sql)
  • Creates tables: queues, cc_agents, cc_tiers, cc_members, extensions, ivrs, ivr_menus, ivr_menu_options, ivr_tts_cache

ODBC Configuration:

  • DSN: callcenter
  • Used by FreeSWITCH pods for direct database access

Health Checks:

  • Liveness Probe: pg_isready every 10s
  • Readiness Probe: pg_isready every 5s

Service:

  • Type: ClusterIP
  • Port: 5432
  • DNS: postgres.client-demo-client.svc.cluster.local

3. FreeSWITCH Registrar (SIP Registration & B2BUA)

Purpose: SIP registration server and media anchor

Deployment Configuration:

  • Replicas: 1
  • Image: ghcr.io/ominis-ai/freeswitch-registrar:latest
  • Strategy: Recreate (to prevent dual SIP binding)
  • Resources:
    • Requests: 512Mi memory, 250m CPU
    • Limits: 1Gi memory, 500m CPU

Networking:

  • Host Network: Enabled (temporary workaround for IPv6 issues)
  • Node Selector: kubernetes.io/hostname: c2-30-bhs5 (pinned to specific node)
  • DNS Policy: ClusterFirstWithHostNet

Ports:

  • 5060/UDP - SIP signaling
  • 5061/TCP - SIP TLS
  • 8080/TCP - XML-RPC
  • 20000-30000/UDP - RTP media

Configuration:

  • Mounted via ConfigMap (freeswitch-registrar-config)
  • Includes: freeswitch.xml, vars.xml, sofia.conf.xml, acl.conf.xml

B2BUA Pattern:

  • Queue→Registrar: Internal IP (10.x.x.x)
  • Registrar→Agent: Public IP (51.79.31.20)
  • Anchors media between internal pods and external SIP clients

Service:

  • Type: ClusterIP
  • Ports: 5060/UDP, 8080/TCP
  • DNS: freeswitch-registrar.client-demo-client.svc.cluster.local

4. ConfigMaps & Secrets

ConfigMaps:

  1. bootstrap - Basic tenant configuration

    registry_url: ghcr.io/ominis-ai
    deployment_mode: kubernetes
    ```bash
  2. freeswitch-registrar-config - Complete FreeSWITCH configuration files

    • freeswitch.xml - Main configuration
    • vars.xml - Variables
    • sofia.conf.xml - SIP profiles
    • acl.conf.xml - ACL rules
    • modules.conf.xml - Module loading
  3. postgres-init-schema - Database initialization SQL

    • Loaded via init-schema.sql file
  4. xmlrpc - XML-RPC connection details

    • URLs for registrar and campaign pods
    • Used by API for XML-RPC requests

Secrets:

  1. bootstrap - Core credentials
    api_key: (API key for authentication)
    db_password: (PostgreSQL password)
2. **postgres-credentials** - Database credentials
```yaml
DB_PASS: (PostgreSQL password)
```bash
3. **freeswitch-xmlrpc** - XML-RPC credentials
```yaml
username: fsadmin
password: (XML-RPC password)
  1. freeswitch-xmlcurl-token - XML-CURL shared secret
    token: (Shared secret for mod_xml_curl)

5. Ingress (HTTPS Routing)

Purpose: External HTTPS access to API

Configuration:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
ingressClassName: traefik
rules:
- host: api.demo.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api
port:
number: 8080
tls:
- secretName: api-tls-cert
hosts:
- api.demo.example.com

Features:

  • TLS Automation: cert-manager issues Let's Encrypt certificates
  • Ingress Controller: Traefik routes traffic to API service
  • HTTPS Redirect: Automatic HTTP→HTTPS redirect
  • Custom Domain: Configure via api.host in values.yaml

6. RBAC (ServiceAccount, Role, RoleBinding)

Purpose: Grant API pod permission to manage runtime resources (queues, IVRs)

ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
name: client-demo-client
namespace: client-demo-client

Role:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: runtime-manager
namespace: client-demo-client
rules:
- apiGroups: ["", "apps"]
resources: ["deployments", "services", "configmaps", "pods", "pods/log"]
verbs: ["get", "list", "create", "update", "patch", "delete", "watch"]

RoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: runtime-manager-binding
namespace: client-demo-client
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: runtime-manager
subjects:
- kind: ServiceAccount
name: client-demo-client
namespace: client-demo-client

Why This Matters:

Without proper RBAC, the API pod cannot create queue/IVR pods dynamically. The runtime-manager role grants the API pod permission to orchestrate Kubernetes resources within its own namespace.


ADR-0003: Helm Over Terraform

Context

Ominis Cluster Manager originally used Terraform (OpenTofu) for infrastructure deployment. The workflow was:

  1. API generates .tf files with queue configuration
  2. Git-sync sidecar detects changes and pulls .tf files
  3. Terraform initplanapply creates Kubernetes resources
  4. .tfstate file tracks infrastructure state

Problem: This workflow was complex, slow, and introduced state synchronization issues.

Decision

Migrate to Helm for infrastructure scaffolding (namespace, RBAC, secrets) and Python Kubernetes client for runtime resources (queues, IVRs).

Alternatives Considered

1. Terraform (Previous Approach)

Rejected for:

  • Complexity: Generating .tf files dynamically added indirection
  • State Drift: Terraform state could drift from actual cluster state
  • Git-sync Dependency: Required sidecar container and SSH keys for git access
  • Slow Feedback: Push to git → wait for sync → wait for apply (15-30 seconds)
  • Port Allocation: Cluster-level tracking was unnecessary with registrar pattern

Advantages:

  • Multi-cloud abstraction (can deploy to AWS, GCP, Azure)
  • Strong state management with .tfstate files
  • Mature ecosystem with many providers

2. Helm + Python Kubernetes Client (Chosen)

Benefits:

  • Simplicity: Standard Kubernetes tooling, no custom state management
  • Speed: Direct API → Kubernetes, no git intermediary (1-3 seconds)
  • Idempotency: Server-side apply natively handles conflicts
  • Observability: Kubernetes events and status directly available
  • Standard: Industry-standard Helm for infra scaffolding
  • Native: Kubernetes-native workflow (kubectl, helm)

⚠️ Trade-offs:

  • Loss of multi-cloud abstraction (Kubernetes-only)
  • No centralized .tfstate equivalent (Kubernetes is the source of truth)

3. Kubernetes Operators

Rejected for:

  • Overkill: Writing a custom operator is a large undertaking
  • Operational Burden: Another service to maintain
  • Complexity: Reconciliation loops and CRDs add complexity

When to Consider:

  • If you need continuous reconciliation (drift correction)
  • If you're building a product around the operator (e.g., Strimzi for Kafka)

4. GitOps (ArgoCD / Flux)

⚠️ Future Consideration:

  • Works well with Helm charts
  • Provides continuous sync from git → cluster
  • Adds operational complexity (another service)
  • Good for multi-tenant SaaS with strict audit requirements

Compatible with Current Approach:

  • Can deploy Helm chart via ArgoCD
  • Not needed for initial MVP

Consequences

Positive:

Simplicity: Reduced from 5 components (API, Terraform, git-sync, git repo, .tfstate) to 2 (Helm, API)
Speed: Queue creation: 15-30s (Terraform) → 3-5s (Kubernetes client)
Reliability: No state drift (Kubernetes is source of truth)
Developer Experience: Standard helm and kubectl commands
Observability: Native Kubernetes events and status

Negative:

Kubernetes-Only: Cannot deploy to other platforms (acceptable for Ominis use case)
No Centralized State: Relies on Kubernetes API as source of truth (acceptable for cloud-native apps)

Migration Impact:

  • ✅ Existing queues continue to work (Kubernetes resources unchanged)
  • ✅ API code updated to use Python Kubernetes client instead of Terraform
  • ✅ Git-sync sidecar removed
  • .tfstate files archived (no longer needed)

Status:Accepted - Migrated in October 2025

Date: 2025-10-04


Deployment Process

Prerequisites

Before deploying the Helm chart, ensure you have:

  • Kubernetes Cluster: 1.19+ (tested on 1.27+)
  • Helm: 3.8+ installed locally
  • kubectl: Configured with cluster access
  • Docker Registry Access: GitHub Container Registry (GHCR) credentials
  • DNS: Domain name for API ingress (e.g., api.demo.example.com)

Step 1: Prepare Values File

Create a values.yaml file with tenant-specific configuration:

tenant:
name: demo-client
labels:
environment: production
owner: platform-team

api:
enabled: true
replicas: 1
image: ghcr.io/ominis-ai/cm-api
tag: latest
api_key: "your-secret-api-key-here"
host: api.demo.example.com
ingressClass: traefik
tlsEnabled: true
tlsSecretName: "" # Cert-manager will create this
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod

postgres:
enabled: true
persistence:
enabled: true
size: 10Gi
storageClass: "" # Use default storage class

freeswitch:
registrar:
enabled: true
replicas: 1

imagePullSecrets:
enabled: true
name: ghcr-secret
username: your-github-username
password: ghp_your_github_token
email: github@example.com
server: ghcr.io

openai:
apiKey: "sk-..." # OpenAI API key for IVR TTS

rbac:
enabled: true
serviceAccount:
create: true

Step 2: Preview Changes

Before applying, preview what will be deployed:

cd /home/matt/projects/fml/cluster-manager

# Lint the chart
helm lint charts/tenant-infra -f values.yaml

# Dry-run to see rendered templates
helm upgrade --install demo-client charts/tenant-infra \
-f values.yaml \
--namespace client-demo-client \
--create-namespace \
--dry-run --debug

What to Look For:

  • ✅ Namespace is client-demo-client
  • ✅ All secrets have values (no empty strings)
  • ✅ Ingress host is correct
  • ✅ Image pull secret is configured

Step 3: Deploy Chart

Deploy the chart using the Makefile:

# Deploy using Makefile (recommended)
make helm-apply TENANT=demo-client VALUES=values.yaml

# Or deploy directly with helm
helm upgrade --install demo-client charts/tenant-infra \
-f values.yaml \
--namespace client-demo-client \
--create-namespace \
--atomic \
--wait \
--timeout 10m

Flags Explained:

  • --install: Install if release doesn't exist
  • --atomic: Rollback on failure
  • --wait: Wait for pods to be ready
  • --timeout 10m: Max time to wait
  • --create-namespace: Create namespace if it doesn't exist

Step 4: Verify Deployment

Check that all resources are created and healthy:

# Check Helm release
helm list -n client-demo-client

# Check all resources
kubectl get all,ing,cm,secret,sa,role,rolebinding -n client-demo-client

# Check pod status
kubectl get pods -n client-demo-client -w

# Check pod logs
kubectl logs -l app=api -n client-demo-client --tail=50
kubectl logs -l app=postgres -n client-demo-client --tail=50
kubectl logs -l app=freeswitch-registrar -n client-demo-client --tail=50

Expected Output:

NAME                                     READY   STATUS    RESTARTS   AGE
pod/api-78c5d4b7ff-xq9wz 1/1 Running 0 2m
pod/postgres-6c8b5d4f7-h8k2p 1/1 Running 0 2m
pod/freeswitch-registrar-5d7c8f-9x4tz 1/1 Running 0 2m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/api ClusterIP 10.43.123.45 <none> 8080/TCP 2m
service/postgres ClusterIP 10.43.123.46 <none> 5432/TCP 2m
service/freeswitch-registrar ClusterIP 10.43.123.47 <none> 5060/UDP,8080/TCP 2m

Step 5: Test API Access

Test that the API is accessible:

# Via ingress (if configured)
curl https://api.demo.example.com/health

# Via port-forward (for testing)
kubectl port-forward svc/api 8080:8080 -n client-demo-client
curl http://localhost:8080/health

Expected Response:

{
"status": "healthy",
"service": "callcenter-api"
}

Deployment Flow Diagram

This sequence diagram shows the complete deployment flow:


Values Configuration

The values.yaml file controls all aspects of the tenant deployment. Here are the key sections:

Tenant Configuration

tenant:
name: demo-client # Tenant identifier (used in namespace)
labels: # Custom labels for all resources
environment: production
owner: platform-team
cost-center: engineering
annotations: # Custom annotations
contact: admin@demo.example.com

API Configuration

api:
enabled: true # Enable/disable API deployment
replicas: 1 # Number of API pods
image: ghcr.io/ominis-ai/cm-api
tag: latest # Image tag (use specific versions in production)
imagePullPolicy: Always # Pull policy (Always, IfNotPresent, Never)
debug: "true" # Enable debug logging
api_key: "Winnipeg2025" # API key for authentication
strategy:
type: Recreate # Deployment strategy (Recreate or RollingUpdate)
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# Ingress configuration
host: api.demo.example.com # Domain name for API
ingressClass: traefik # Ingress controller
servicePort: 8080 # Service port
tlsEnabled: false # Enable TLS (requires cert-manager)
tlsSecretName: "" # TLS secret name (auto-generated if empty)
annotations: # Ingress annotations
cert-manager.io/cluster-issuer: letsencrypt-prod

PostgreSQL Configuration

postgres:
enabled: true # Enable/disable PostgreSQL deployment
image: postgres
tag: "15" # PostgreSQL version
imagePullPolicy: IfNotPresent
user: callcenter_user # Database user
database: callcenter # Database name
password: "callcenter_pass_demo-client" # Database password
persistence:
enabled: true # Enable persistent storage
size: 10Gi # PVC size
storageClass: "" # Storage class (empty = default)
existingClaim: "" # Use existing PVC (optional)
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
odbc:
dsn: "callcenter" # ODBC DSN name for FreeSWITCH

FreeSWITCH Configuration

freeswitch:
xmlcurl:
enabled: true # Enable mod_xml_curl
token: "Winnipeg2025-xmlcurl-changeme" # Shared secret token

registrar:
enabled: true # Enable registrar deployment
replicas: 1
image: ghcr.io/ominis-ai/freeswitch-registrar
tag: latest
imagePullPolicy: Always
hostNetwork: true # Use host networking (IPv6 workaround)
nodeSelector:
kubernetes.io/hostname: c2-30-bhs5 # Pin to specific node
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"

XML-RPC Configuration

xmlrpc:
enabled: true
port: 8080
url: "http://freeswitch-registrar.demo-client.svc.cluster.local:8080/RPC2"
username: "fsadmin" # XML-RPC username
password: "Winnipeg2025" # XML-RPC password
acl:
enabled: true # Enable ACL
allowedCidrs: # Allowed IP ranges
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
campaign:
url: "http://freeswitch-campaign.demo-client.svc.cluster.local:8080/RPC2"
username: "fsadmin"
password: "Winnipeg2025"

OpenAI Configuration (for IVR TTS)

openai:
apiKey: "sk-..." # OpenAI API key
ttsModel: "tts-1" # TTS model (tts-1, tts-1-hd)
defaultVoice: "alloy" # Default voice (alloy, echo, fable, onyx, nova, shimmer)

RBAC Configuration

rbac:
enabled: true # Enable RBAC resources
serviceAccount:
create: true # Create ServiceAccount
name: "" # ServiceAccount name (auto-generated if empty)
annotations: {} # ServiceAccount annotations

Image Pull Secrets

imagePullSecrets:
enabled: true # Enable image pull secret
name: ghcr-secret # Secret name
username: mattjoubert # GitHub username
password: ghp_... # GitHub token (with packages:read scope)
email: github@mattjoubert.com # Email address
server: ghcr.io # Registry server

Multi-Tenant Isolation

Ominis Cluster Manager achieves strong multi-tenant isolation through several Kubernetes mechanisms:

1. Kubernetes Namespaces

Pattern: One namespace per tenant (client-{tenant-name})

Benefits:

  • ✅ Logical separation of resources
  • ✅ RBAC boundaries (ServiceAccount permissions scoped to namespace)
  • ✅ Resource quotas per tenant
  • ✅ Network policies per tenant
  • ✅ Clear ownership (all resources in namespace belong to tenant)

Example:

# Tenant A
kubectl get all -n client-demo-client

# Tenant B
kubectl get all -n client-acme-corp

2. Network Policies

Purpose: Restrict network traffic between tenants

Example Policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tenant-isolation
namespace: client-demo-client
spec:
podSelector: {} # Apply to all pods in namespace
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
tenant: demo-client # Only from same tenant
egress:
- to:
- namespaceSelector:
matchLabels:
tenant: demo-client # Only to same tenant

3. Resource Quotas

Purpose: Limit resource consumption per tenant

Example Quota:

apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-quota
namespace: client-demo-client
spec:
hard:
requests.cpu: "10" # Max 10 CPU cores requested
requests.memory: 20Gi # Max 20Gi memory requested
limits.cpu: "20" # Max 20 CPU cores limit
limits.memory: 40Gi # Max 40Gi memory limit
persistentvolumeclaims: "10" # Max 10 PVCs
services.loadbalancers: "2" # Max 2 load balancers

4. Separate Databases

Approach: Each tenant has a dedicated PostgreSQL instance

Benefits:

  • ✅ Complete data isolation (no shared tables)
  • ✅ Independent backups and restores
  • ✅ Per-tenant performance tuning
  • ✅ No noisy neighbor issues
  • ⚠️ Trade-off: Higher resource usage vs. security

5. Isolated Secrets

Approach: Secrets are namespace-scoped

Benefits:

  • ✅ API keys, passwords, tokens isolated per tenant
  • ✅ No cross-tenant secret access
  • ✅ ServiceAccount can only read secrets in its namespace

Example:

# Tenant A secrets
kubectl get secrets -n client-demo-client

# Tenant B secrets (different secrets)
kubectl get secrets -n client-acme-corp

Tenant Isolation Diagram


Deployment Examples

Example 1: Preview Deployment Changes

See what will change without applying:

cd /home/matt/projects/fml/cluster-manager

# Preview changes using Makefile
make helm-diff TENANT=demo-client VALUES=values.yaml

# Or use helm directly
helm template demo-client charts/tenant-infra \
-f values.yaml \
--namespace client-demo-client > /tmp/demo-client.yaml

kubectl diff -n client-demo-client -f /tmp/demo-client.yaml || true

Output:

+ apiVersion: v1
+ kind: Namespace
+ metadata:
+ name: client-demo-client
...
+ apiVersion: apps/v1
+ kind: Deployment
+ metadata:
+ name: api
+ namespace: client-demo-client
...

Use Cases:

  • ✅ Verify configuration before applying
  • ✅ Review resource changes during upgrades
  • ✅ CI/CD pull request previews

Example 2: Deploy New Tenant

Deploy a new tenant from scratch:

# Step 1: Create values file
cat > tenant-acme.yaml <<EOF
tenant:
name: acme-corp

api:
host: api-acme.example.com
api_key: "acme-secret-key"

postgres:
password: "acme-db-password"

imagePullSecrets:
enabled: true
username: your-github-username
password: ghp_your_github_token
EOF

# Step 2: Deploy using Makefile
make helm-apply TENANT=acme-corp VALUES=tenant-acme.yaml

# Step 3: Watch deployment progress
kubectl get pods -n client-acme-corp -w

# Step 4: Verify deployment
helm status acme-corp -n client-acme-corp
kubectl get all,ing -n client-acme-corp

Timeline:

  1. Namespace creation: ~1s
  2. ConfigMaps/Secrets: ~2s
  3. PostgreSQL pod ready: ~10-15s
  4. API pod ready: ~5-10s
  5. Registrar pod ready: ~10-15s
  6. Total: ~30-45 seconds

Example 3: Check Deployment Status

Check the status of a deployed tenant:

# List all Helm releases
helm list -A

# Get release details
helm status demo-client -n client-demo-client

# View deployed values
helm get values demo-client -n client-demo-client

# View rendered manifests
helm get manifest demo-client -n client-demo-client

# Check pod status
kubectl get pods -n client-demo-client

# Check ingress
kubectl get ingress -n client-demo-client
kubectl describe ingress api -n client-demo-client

# Check logs
kubectl logs -l app=api -n client-demo-client --tail=100 -f

Expected Output:

NAME: demo-client
LAST DEPLOYED: Mon Oct 14 10:00:00 2025
NAMESPACE: client-demo-client
STATUS: deployed
REVISION: 1

Example 4: Customize Values Per Environment

Use different values for dev, staging, and production:

# Development environment
cat > values-dev.yaml <<EOF
tenant:
name: demo-client
labels:
environment: dev

api:
replicas: 1
host: api-dev.demo.example.com
tlsEnabled: false # No TLS in dev
debug: "true"

postgres:
persistence:
size: 5Gi # Smaller storage in dev

freeswitch:
registrar:
resources:
limits:
memory: "512Mi" # Lower limits in dev
EOF

# Production environment
cat > values-prod.yaml <<EOF
tenant:
name: demo-client
labels:
environment: prod

api:
replicas: 3 # HA in production
host: api.demo.example.com
tlsEnabled: true
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
debug: "false"

postgres:
persistence:
size: 50Gi # Larger storage in prod
storageClass: "fast-ssd"

freeswitch:
registrar:
replicas: 2 # HA registrar
resources:
limits:
memory: "2Gi"
EOF

# Deploy to dev
make helm-apply TENANT=demo-client-dev VALUES=values-dev.yaml

# Deploy to prod
make helm-apply TENANT=demo-client-prod VALUES=values-prod.yaml

Example 5: Rollback Failed Deployment

Rollback to a previous working version:

# List release history
helm history demo-client -n client-demo-client

# Example output:
# REVISION UPDATED STATUS CHART DESCRIPTION
# 1 Mon Oct 14 10:00:00 2025 superseded tenant-infra-1.0.0 Install complete
# 2 Mon Oct 14 11:00:00 2025 superseded tenant-infra-1.1.0 Upgrade complete
# 3 Mon Oct 14 12:00:00 2025 failed tenant-infra-1.2.0 Upgrade failed

# Rollback to previous version (revision 2)
helm rollback demo-client -n client-demo-client

# Or rollback to specific revision
helm rollback demo-client 2 -n client-demo-client

# Verify rollback
kubectl get pods -n client-demo-client -w
helm history demo-client -n client-demo-client

Rollback Process:

  1. Helm reverts to previous manifest
  2. Kubernetes applies changes
  3. Pods restart with old configuration
  4. Timeline: ~30-60 seconds

Upgrade Strategy

Helm Chart Upgrades

When upgrading the tenant infrastructure chart:

# Step 1: Pull latest changes
git pull origin main

# Step 2: Review chart changes
git log --oneline charts/tenant-infra/

# Step 3: Preview upgrade
make helm-diff TENANT=demo-client VALUES=values.yaml

# Step 4: Apply upgrade
make helm-apply TENANT=demo-client VALUES=values.yaml

# Step 5: Monitor rollout
kubectl get pods -n client-demo-client -w

Application Upgrades (API Image)

When upgrading the API application:

# Step 1: Build and push new image
make build TAG=v1.2.0
docker push ghcr.io/ominis-ai/cm-api:v1.2.0

# Step 2: Update values.yaml
sed -i 's/tag: latest/tag: v1.2.0/' values.yaml

# Step 3: Apply upgrade
make helm-apply TENANT=demo-client VALUES=values.yaml

# Step 4: Monitor rollout
kubectl rollout status deployment/api -n client-demo-client

Zero-Downtime Upgrades

For zero-downtime upgrades, use RollingUpdate strategy:

api:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Max pods above desired count during update
maxUnavailable: 0 # Min pods available during update

Process:

  1. New pod starts
  2. New pod becomes ready
  3. Old pod terminates
  4. Result: Always at least 1 pod serving traffic

Context & Rationale

Why One Chart Per Tenant?

Decision: Deploy one Helm release per tenant

Benefits:

Complete Isolation: Each tenant has dedicated resources
Independent Upgrades: Upgrade one tenant without affecting others
Per-Tenant Customization: Different resource limits, features, configurations
Clear Resource Ownership: All resources in namespace belong to tenant
Simplified Multi-Tenancy: Easy to add/remove tenants

Trade-offs:

⚠️ Resource Overhead: Each tenant has dedicated PostgreSQL, API, Registrar
⚠️ Operational Complexity: More releases to manage

Mitigation:

  • Use resource limits to cap per-tenant usage
  • Automate tenant deployment via CI/CD
  • Monitor all tenants centrally with Prometheus

Why StatefulSet for PostgreSQL?

Decision: Use Deployment (not StatefulSet) for PostgreSQL

Note: The chart currently uses Deployment for PostgreSQL, not StatefulSet. This is acceptable for single-replica databases, but StatefulSet is recommended for HA deployments.

StatefulSet Benefits:

  • ✅ Stable pod identity (postgres-0, postgres-1)
  • ✅ Ordered deployment and scaling
  • ✅ Persistent storage guarantees
  • ✅ Graceful scaling and termination
  • ✅ Reliable restarts (same PVC every time)

When to Use StatefulSet:

  • Multi-replica databases (primary + replicas)
  • Databases requiring stable network identity
  • Databases with ordered scaling requirements

Current Approach (Deployment):

  • ✅ Simpler for single-replica databases
  • ✅ Sufficient for non-HA deployments
  • ⚠️ Not suitable for HA PostgreSQL

Why Separate Registrar?

Decision: Deploy one registrar per tenant (not shared)

Benefits:

Tenant Isolation: Registrar failures don't cascade across tenants
Per-Tenant Configuration: Different SIP profiles per tenant
Simplified Debugging: Registrar logs map 1:1 to tenant
Security: Network policies isolate registrar per tenant

Trade-offs:

⚠️ Resource Overhead: Each tenant has dedicated registrar pod
⚠️ Port Management: Each registrar needs dedicated ports (if not using host networking)

Alternative (Shared Registrar):

  • One registrar for all tenants
  • ❌ Single point of failure
  • ❌ Configuration complexity (multi-tenant SIP profiles)
  • ❌ Security concerns (tenant boundary violations)

Configuration Management Best Practices

Decision: Store configuration in version control

Approach:

  1. values.yaml: Committed to git (without secrets)
  2. Secrets: Stored in external secret manager (e.g., Vault, AWS Secrets Manager)
  3. Environment-Specific Values: Separate files (values-dev.yaml, values-prod.yaml)

Example:

# values.yaml (committed to git)
tenant:
name: demo-client

api:
host: api.demo.example.com
api_key: "" # Injected via CI/CD

# secrets.env (NOT committed to git)
API_KEY=secret-key-here
DB_PASSWORD=db-password-here
OPENAI_API_KEY=sk-...

CI/CD Integration:

# Inject secrets during deployment
helm upgrade --install demo-client charts/tenant-infra \
-f values.yaml \
--set api.api_key=$API_KEY \
--set postgres.password=$DB_PASSWORD \
--set openai.apiKey=$OPENAI_API_KEY


Summary

Ominis Cluster Manager uses Helm for infrastructure scaffolding (namespace, RBAC, secrets, ingress) and Python Kubernetes client for runtime resources (queues, IVRs). This provides the best of both worlds:

Helm: Standard, versioned, rollback-friendly infrastructure deployment
Kubernetes Client: Fast, simple, idempotent runtime orchestration
Multi-Tenant: Strong isolation via namespaces, network policies, and resource quotas
Developer Experience: Simple make helm-apply command deploys complete tenant

The migration from Terraform to Helm reduced deployment complexity, improved performance (15-30s → 3-5s for queues), and aligned with Kubernetes-native best practices.


Powered by Ominis.ai - Modern call center infrastructure for cloud-native platforms.