Helm Deployment & Infrastructure
Overview
Ominis Cluster Manager uses Helm as its primary infrastructure deployment tool, following a two-tier architecture that separates cluster-wide services from tenant-specific resources. This page documents the complete Helm chart infrastructure, tenant model, deployment process, and the rationale behind choosing Helm over Terraform.
What is Helm?
Helm is the package manager for Kubernetes, providing templating, versioning, and lifecycle management for Kubernetes applications. Think of it as apt/yum for Kubernetes - it packages related Kubernetes resources into versioned charts that can be installed, upgraded, and rolled back as a single unit.
Why Helm for Ominis?
Decision: Use Helm for infrastructure scaffolding (namespace, RBAC, secrets, ingress) and Python Kubernetes client for runtime resources (queues, IVRs).
Rationale:
- Kubernetes-Native: Standard tooling recognized across the industry
- Simplified Operations: One command to deploy entire tenant infrastructure
- Version Control: Track chart versions alongside application versions
- Templating: Dynamic configuration via
values.yaml - Rollback Support: Native rollback to previous releases
- GitOps Ready: Compatible with ArgoCD, Flux, and other GitOps tools
Cluster vs Tenant Infrastructure Model
Two-Tier Infrastructure Approach
Ominis uses a two-tier infrastructure model that separates cluster-wide and tenant-scoped resources. This provides operational efficiency (shared services) while maintaining tenant isolation (dedicated resources).
Cluster Infrastructure (cluster-infra repository)
Deployed: Once per Kubernetes cluster
Shared: By all tenants
Purpose: Foundation services that all tenants depend on
Services:
- cert-manager: TLS certificate automation (Let's Encrypt)
- Traefik: Ingress controller and reverse proxy
- Authentik: SSO and OAuth2 provider
- Vaultwarden: Password manager
- Homer: SIP traffic monitoring
- Excalidraw: Diagramming tool
Namespaces:
cert-manager- Certificate automationauthentik- Identity and access managementvaultwarden- Secrets managementflow-proxy- Traefik ingresshomer- SIP monitoringexcalidraw- Documentation diagrams
Tenant Infrastructure (this document)
Deployed: Once per tenant/customer
Isolated: In dedicated namespaces
Purpose: Customer-specific application services
Services:
- API Service: FastAPI REST API for call control
- PostgreSQL: Database for configuration and state
- FreeSWITCH Registrar: SIP registration server and B2BUA
- Queue Pods: Dynamic FreeSWITCH instances per queue (runtime)
- IVR Pods: Dynamic FreeSWITCH instances per IVR (runtime)
Namespace Pattern: client-{tenant-name} (e.g., client-demo-client)
Comparison Table
| Aspect | Cluster Infrastructure | Tenant Infrastructure |
|---|---|---|
| Deployment Frequency | Once per cluster | Once per tenant |
| Sharing | Shared by all tenants | Isolated per tenant |
| Examples | Cert-manager, Traefik | API, PostgreSQL, Queues |
| Helm Chart Location | /cluster-infra/helm-charts/ | /cluster-manager/charts/tenant-infra/ |
| Update Impact | Affects all tenants | Affects single tenant |
| Resource Efficiency | High (shared) | Lower (isolated) |
| Fault Isolation | Lower | High |
| Customization | Minimal | Per-tenant |
Dependency Flow
Why This Model?
Benefits:
✅ Cost Efficiency: One cert-manager instead of N cert-managers (where N = tenant count)
✅ Operational Simplicity: Upgrade cluster services once, not per-tenant
✅ Tenant Isolation: Tenant failures don't cascade to other tenants
✅ Security: Network policies enforce namespace boundaries
✅ Scalability: Independent tenant scaling without affecting others
Trade-offs:
❌ Shared Failure Domain: Cert-manager downtime affects all tenants
❌ Version Lock: Cluster services uniform across tenants (no per-tenant versions)
⚠️ Operational Complexity: Two-tier management requires coordination
Integration Points
The tenant infrastructure integrates with cluster infrastructure at several key points:
- TLS Certificates: Tenant ingress annotations reference
letsencrypt-prodClusterIssuer from cert-manager - Ingress Routing: Tenant Ingress resources use
ingressClassName: traefik - SIP Monitoring: Homer DaemonSet captures traffic from tenant queue pods
- Authentication (future): Tenant APIs can delegate authentication to Authentik
Helm Chart Structure
The tenant-infra chart contains all resources needed to deploy a complete tenant environment.
Chart Directory Layout
charts/tenant-infra/
├── Chart.yaml # Chart metadata (version, name, description)
├── values.yaml # Default configuration values
├── values.schema.json # Value validation schema
├── init-schema.sql # PostgreSQL database initialization
├── templates/ # Kubernetes resource templates
│ ├── _helpers.tpl # Template helper functions
│ ├── NOTES.txt # Post-install instructions
│ ├── namespace.yaml # Tenant namespace (not used, created via --create-namespace)
│ ├── rbac.yaml # RBAC resources (roles, bindings)
│ ├── serviceaccount.yaml # ServiceAccount for API pod
│ ├── configmaps/
│ │ ├── configmap-bootstrap.yaml # Bootstrap configuration
│ │ ├── configmap-freeswitch-registrar.yaml # Registrar FreeSWITCH config
│ │ ├── configmap-freeswitch-xmlcurl.yaml # XML-CURL config
│ │ ├── configmap-postgres-init.yaml # PostgreSQL init SQL
│ │ └── configmap-xmlrpc.yaml # XML-RPC credentials
│ ├── secrets/
│ │ ├── secret-bootstrap.yaml # API key and DB password
│ │ ├── secret-freeswitch-xmlcurl.yaml # XML-CURL token
│ │ ├── secret-postgres.yaml # PostgreSQL credentials
│ │ ├── secret-xmlrpc.yaml # XML-RPC credentials
│ │ └── secret-n8n.yaml # n8n configuration (optional)
│ ├── deployments/
│ │ ├── deployment-api.yaml # API FastAPI deployment
│ │ ├── deployment-postgres.yaml # PostgreSQL deployment
│ │ └── deployment-freeswitch-registrar.yaml # Registrar deployment
│ ├── services/
│ │ ├── service-api.yaml # API service
│ │ ├── service-postgres.yaml # PostgreSQL service
│ │ ├── service-freeswitch-registrar.yaml # Registrar service
│ │ └── service-n8n.yaml # n8n service (optional)
│ ├── pvcs/
│ │ ├── pvc-postgres.yaml # PostgreSQL persistent storage
│ │ ├── pvc-ivr-audio-cache.yaml # IVR TTS audio cache (optional)
│ │ └── pvc-n8n.yaml # n8n data storage (optional)
│ ├── ingress-api.yaml # API ingress (HTTPS)
│ └── imagepullsecret.yaml # GitHub Container Registry credentials
└── README.md # Chart documentation
Key Files
Chart.yaml
apiVersion: v2
name: tenant-infra
description: Tenant infrastructure scaffolding for Ominis Cluster Manager
type: application
version: 1.0.0
appVersion: "1.0.0"
maintainers:
- name: Ominis AI
email: admin@ominis.ai
values.yaml
The values file defines all configurable parameters. Key sections:
tenant: Tenant name, labels, annotationsapi: API deployment configuration (replicas, image, resources)postgres: Database configuration (persistence, resources)freeswitch.registrar: Registrar deployment (replicas, resources, networking)xmlrpc: XML-RPC URLs and credentialsopenai: OpenAI API key for IVR TTSrbac: RBAC configuration (ServiceAccount, Role, RoleBinding)
Tenant Infrastructure Components
Resource Hierarchy
1. API Service (FastAPI Deployment)
Purpose: REST API gateway for all call control operations
Deployment Configuration:
- Replicas: Configurable (default: 1)
- Image:
ghcr.io/ominis-ai/cm-api:latest - Strategy: Recreate (to avoid dual-write conflicts)
- Resources:
- Requests: 256Mi memory, 250m CPU
- Limits: 512Mi memory, 500m CPU
Key Environment Variables:
- DEPLOYMENT_MODE: kubernetes
- KUBERNETES_NAMESPACE: client-demo-client
- API_KEY: (from secret)
- DB_DSN: postgresql+asyncpg://user:pass@postgres:5432/callcenter
- FS_XMLRPC_URL: http://freeswitch-registrar:8080/RPC2
- OPENAI_API_KEY: (from values)
Health Checks:
- Liveness Probe:
GET /healthevery 10s - Readiness Probe:
GET /healthevery 5s
Service:
- Type: ClusterIP
- Port: 8080
- Target Port: 8000
2. PostgreSQL (Database Deployment)
Purpose: Centralized configuration and state storage
Deployment Configuration:
- Replicas: 1 (not HA by default)
- Image:
postgres:15 - Strategy: Recreate (ensures single writer)
- Resources:
- Requests: 256Mi memory, 250m CPU
- Limits: 512Mi memory, 500m CPU
Persistent Storage:
- PVC Size: 10Gi (configurable)
- Storage Class: Default (or specify custom)
- Mount Path:
/var/lib/postgresql/data
Initialization:
- SQL script mounted via ConfigMap (
init-schema.sql) - Creates tables:
queues,cc_agents,cc_tiers,cc_members,extensions,ivrs,ivr_menus,ivr_menu_options,ivr_tts_cache
ODBC Configuration:
- DSN:
callcenter - Used by FreeSWITCH pods for direct database access
Health Checks:
- Liveness Probe:
pg_isreadyevery 10s - Readiness Probe:
pg_isreadyevery 5s
Service:
- Type: ClusterIP
- Port: 5432
- DNS:
postgres.client-demo-client.svc.cluster.local
3. FreeSWITCH Registrar (SIP Registration & B2BUA)
Purpose: SIP registration server and media anchor
Deployment Configuration:
- Replicas: 1
- Image:
ghcr.io/ominis-ai/freeswitch-registrar:latest - Strategy: Recreate (to prevent dual SIP binding)
- Resources:
- Requests: 512Mi memory, 250m CPU
- Limits: 1Gi memory, 500m CPU
Networking:
- Host Network: Enabled (temporary workaround for IPv6 issues)
- Node Selector:
kubernetes.io/hostname: c2-30-bhs5(pinned to specific node) - DNS Policy: ClusterFirstWithHostNet
Ports:
- 5060/UDP - SIP signaling
- 5061/TCP - SIP TLS
- 8080/TCP - XML-RPC
- 20000-30000/UDP - RTP media
Configuration:
- Mounted via ConfigMap (
freeswitch-registrar-config) - Includes:
freeswitch.xml,vars.xml,sofia.conf.xml,acl.conf.xml
B2BUA Pattern:
- Queue→Registrar: Internal IP (10.x.x.x)
- Registrar→Agent: Public IP (51.79.31.20)
- Anchors media between internal pods and external SIP clients
Service:
- Type: ClusterIP
- Ports: 5060/UDP, 8080/TCP
- DNS:
freeswitch-registrar.client-demo-client.svc.cluster.local
4. ConfigMaps & Secrets
ConfigMaps:
-
bootstrap - Basic tenant configuration
registry_url: ghcr.io/ominis-ai
deployment_mode: kubernetes
```bash -
freeswitch-registrar-config - Complete FreeSWITCH configuration files
freeswitch.xml- Main configurationvars.xml- Variablessofia.conf.xml- SIP profilesacl.conf.xml- ACL rulesmodules.conf.xml- Module loading
-
postgres-init-schema - Database initialization SQL
- Loaded via
init-schema.sqlfile
- Loaded via
-
xmlrpc - XML-RPC connection details
- URLs for registrar and campaign pods
- Used by API for XML-RPC requests
Secrets:
- bootstrap - Core credentials
api_key: (API key for authentication)
db_password: (PostgreSQL password)
2. **postgres-credentials** - Database credentials
```yaml
DB_PASS: (PostgreSQL password)
```bash
3. **freeswitch-xmlrpc** - XML-RPC credentials
```yaml
username: fsadmin
password: (XML-RPC password)
- freeswitch-xmlcurl-token - XML-CURL shared secret
token: (Shared secret for mod_xml_curl)
5. Ingress (HTTPS Routing)
Purpose: External HTTPS access to API
Configuration:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
ingressClassName: traefik
rules:
- host: api.demo.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api
port:
number: 8080
tls:
- secretName: api-tls-cert
hosts:
- api.demo.example.com
Features:
- TLS Automation: cert-manager issues Let's Encrypt certificates
- Ingress Controller: Traefik routes traffic to API service
- HTTPS Redirect: Automatic HTTP→HTTPS redirect
- Custom Domain: Configure via
api.hostin values.yaml
6. RBAC (ServiceAccount, Role, RoleBinding)
Purpose: Grant API pod permission to manage runtime resources (queues, IVRs)
ServiceAccount:
apiVersion: v1
kind: ServiceAccount
metadata:
name: client-demo-client
namespace: client-demo-client
Role:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: runtime-manager
namespace: client-demo-client
rules:
- apiGroups: ["", "apps"]
resources: ["deployments", "services", "configmaps", "pods", "pods/log"]
verbs: ["get", "list", "create", "update", "patch", "delete", "watch"]
RoleBinding:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: runtime-manager-binding
namespace: client-demo-client
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: runtime-manager
subjects:
- kind: ServiceAccount
name: client-demo-client
namespace: client-demo-client
Why This Matters:
Without proper RBAC, the API pod cannot create queue/IVR pods dynamically. The runtime-manager role grants the API pod permission to orchestrate Kubernetes resources within its own namespace.
ADR-0003: Helm Over Terraform
Context
Ominis Cluster Manager originally used Terraform (OpenTofu) for infrastructure deployment. The workflow was:
- API generates
.tffiles with queue configuration - Git-sync sidecar detects changes and pulls
.tffiles - Terraform
init→plan→applycreates Kubernetes resources .tfstatefile tracks infrastructure state
Problem: This workflow was complex, slow, and introduced state synchronization issues.
Decision
Migrate to Helm for infrastructure scaffolding (namespace, RBAC, secrets) and Python Kubernetes client for runtime resources (queues, IVRs).
Alternatives Considered
1. Terraform (Previous Approach)
❌ Rejected for:
- Complexity: Generating
.tffiles dynamically added indirection - State Drift: Terraform state could drift from actual cluster state
- Git-sync Dependency: Required sidecar container and SSH keys for git access
- Slow Feedback: Push to git → wait for sync → wait for apply (15-30 seconds)
- Port Allocation: Cluster-level tracking was unnecessary with registrar pattern
✅ Advantages:
- Multi-cloud abstraction (can deploy to AWS, GCP, Azure)
- Strong state management with
.tfstatefiles - Mature ecosystem with many providers
2. Helm + Python Kubernetes Client (Chosen)
✅ Benefits:
- Simplicity: Standard Kubernetes tooling, no custom state management
- Speed: Direct API → Kubernetes, no git intermediary (1-3 seconds)
- Idempotency: Server-side apply natively handles conflicts
- Observability: Kubernetes events and status directly available
- Standard: Industry-standard Helm for infra scaffolding
- Native: Kubernetes-native workflow (kubectl, helm)
⚠️ Trade-offs:
- Loss of multi-cloud abstraction (Kubernetes-only)
- No centralized
.tfstateequivalent (Kubernetes is the source of truth)
3. Kubernetes Operators
❌ Rejected for:
- Overkill: Writing a custom operator is a large undertaking
- Operational Burden: Another service to maintain
- Complexity: Reconciliation loops and CRDs add complexity
✅ When to Consider:
- If you need continuous reconciliation (drift correction)
- If you're building a product around the operator (e.g., Strimzi for Kafka)
4. GitOps (ArgoCD / Flux)
⚠️ Future Consideration:
- Works well with Helm charts
- Provides continuous sync from git → cluster
- Adds operational complexity (another service)
- Good for multi-tenant SaaS with strict audit requirements
✅ Compatible with Current Approach:
- Can deploy Helm chart via ArgoCD
- Not needed for initial MVP
Consequences
Positive:
✅ Simplicity: Reduced from 5 components (API, Terraform, git-sync, git repo, .tfstate) to 2 (Helm, API)
✅ Speed: Queue creation: 15-30s (Terraform) → 3-5s (Kubernetes client)
✅ Reliability: No state drift (Kubernetes is source of truth)
✅ Developer Experience: Standard helm and kubectl commands
✅ Observability: Native Kubernetes events and status
Negative:
❌ Kubernetes-Only: Cannot deploy to other platforms (acceptable for Ominis use case)
❌ No Centralized State: Relies on Kubernetes API as source of truth (acceptable for cloud-native apps)
Migration Impact:
- ✅ Existing queues continue to work (Kubernetes resources unchanged)
- ✅ API code updated to use Python Kubernetes client instead of Terraform
- ✅ Git-sync sidecar removed
- ✅
.tfstatefiles archived (no longer needed)
Status: ✅ Accepted - Migrated in October 2025
Date: 2025-10-04
Deployment Process
Prerequisites
Before deploying the Helm chart, ensure you have:
- Kubernetes Cluster: 1.19+ (tested on 1.27+)
- Helm: 3.8+ installed locally
- kubectl: Configured with cluster access
- Docker Registry Access: GitHub Container Registry (GHCR) credentials
- DNS: Domain name for API ingress (e.g.,
api.demo.example.com)
Step 1: Prepare Values File
Create a values.yaml file with tenant-specific configuration:
tenant:
name: demo-client
labels:
environment: production
owner: platform-team
api:
enabled: true
replicas: 1
image: ghcr.io/ominis-ai/cm-api
tag: latest
api_key: "your-secret-api-key-here"
host: api.demo.example.com
ingressClass: traefik
tlsEnabled: true
tlsSecretName: "" # Cert-manager will create this
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
postgres:
enabled: true
persistence:
enabled: true
size: 10Gi
storageClass: "" # Use default storage class
freeswitch:
registrar:
enabled: true
replicas: 1
imagePullSecrets:
enabled: true
name: ghcr-secret
username: your-github-username
password: ghp_your_github_token
email: github@example.com
server: ghcr.io
openai:
apiKey: "sk-..." # OpenAI API key for IVR TTS
rbac:
enabled: true
serviceAccount:
create: true
Step 2: Preview Changes
Before applying, preview what will be deployed:
cd /home/matt/projects/fml/cluster-manager
# Lint the chart
helm lint charts/tenant-infra -f values.yaml
# Dry-run to see rendered templates
helm upgrade --install demo-client charts/tenant-infra \
-f values.yaml \
--namespace client-demo-client \
--create-namespace \
--dry-run --debug
What to Look For:
- ✅ Namespace is
client-demo-client - ✅ All secrets have values (no empty strings)
- ✅ Ingress host is correct
- ✅ Image pull secret is configured
Step 3: Deploy Chart
Deploy the chart using the Makefile:
# Deploy using Makefile (recommended)
make helm-apply TENANT=demo-client VALUES=values.yaml
# Or deploy directly with helm
helm upgrade --install demo-client charts/tenant-infra \
-f values.yaml \
--namespace client-demo-client \
--create-namespace \
--atomic \
--wait \
--timeout 10m
Flags Explained:
--install: Install if release doesn't exist--atomic: Rollback on failure--wait: Wait for pods to be ready--timeout 10m: Max time to wait--create-namespace: Create namespace if it doesn't exist
Step 4: Verify Deployment
Check that all resources are created and healthy:
# Check Helm release
helm list -n client-demo-client
# Check all resources
kubectl get all,ing,cm,secret,sa,role,rolebinding -n client-demo-client
# Check pod status
kubectl get pods -n client-demo-client -w
# Check pod logs
kubectl logs -l app=api -n client-demo-client --tail=50
kubectl logs -l app=postgres -n client-demo-client --tail=50
kubectl logs -l app=freeswitch-registrar -n client-demo-client --tail=50
Expected Output:
NAME READY STATUS RESTARTS AGE
pod/api-78c5d4b7ff-xq9wz 1/1 Running 0 2m
pod/postgres-6c8b5d4f7-h8k2p 1/1 Running 0 2m
pod/freeswitch-registrar-5d7c8f-9x4tz 1/1 Running 0 2m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/api ClusterIP 10.43.123.45 <none> 8080/TCP 2m
service/postgres ClusterIP 10.43.123.46 <none> 5432/TCP 2m
service/freeswitch-registrar ClusterIP 10.43.123.47 <none> 5060/UDP,8080/TCP 2m
Step 5: Test API Access
Test that the API is accessible:
# Via ingress (if configured)
curl https://api.demo.example.com/health
# Via port-forward (for testing)
kubectl port-forward svc/api 8080:8080 -n client-demo-client
curl http://localhost:8080/health
Expected Response:
{
"status": "healthy",
"service": "callcenter-api"
}
Deployment Flow Diagram
This sequence diagram shows the complete deployment flow:
Values Configuration
The values.yaml file controls all aspects of the tenant deployment. Here are the key sections:
Tenant Configuration
tenant:
name: demo-client # Tenant identifier (used in namespace)
labels: # Custom labels for all resources
environment: production
owner: platform-team
cost-center: engineering
annotations: # Custom annotations
contact: admin@demo.example.com
API Configuration
api:
enabled: true # Enable/disable API deployment
replicas: 1 # Number of API pods
image: ghcr.io/ominis-ai/cm-api
tag: latest # Image tag (use specific versions in production)
imagePullPolicy: Always # Pull policy (Always, IfNotPresent, Never)
debug: "true" # Enable debug logging
api_key: "Winnipeg2025" # API key for authentication
strategy:
type: Recreate # Deployment strategy (Recreate or RollingUpdate)
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# Ingress configuration
host: api.demo.example.com # Domain name for API
ingressClass: traefik # Ingress controller
servicePort: 8080 # Service port
tlsEnabled: false # Enable TLS (requires cert-manager)
tlsSecretName: "" # TLS secret name (auto-generated if empty)
annotations: # Ingress annotations
cert-manager.io/cluster-issuer: letsencrypt-prod
PostgreSQL Configuration
postgres:
enabled: true # Enable/disable PostgreSQL deployment
image: postgres
tag: "15" # PostgreSQL version
imagePullPolicy: IfNotPresent
user: callcenter_user # Database user
database: callcenter # Database name
password: "callcenter_pass_demo-client" # Database password
persistence:
enabled: true # Enable persistent storage
size: 10Gi # PVC size
storageClass: "" # Storage class (empty = default)
existingClaim: "" # Use existing PVC (optional)
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
odbc:
dsn: "callcenter" # ODBC DSN name for FreeSWITCH
FreeSWITCH Configuration
freeswitch:
xmlcurl:
enabled: true # Enable mod_xml_curl
token: "Winnipeg2025-xmlcurl-changeme" # Shared secret token
registrar:
enabled: true # Enable registrar deployment
replicas: 1
image: ghcr.io/ominis-ai/freeswitch-registrar
tag: latest
imagePullPolicy: Always
hostNetwork: true # Use host networking (IPv6 workaround)
nodeSelector:
kubernetes.io/hostname: c2-30-bhs5 # Pin to specific node
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
XML-RPC Configuration
xmlrpc:
enabled: true
port: 8080
url: "http://freeswitch-registrar.demo-client.svc.cluster.local:8080/RPC2"
username: "fsadmin" # XML-RPC username
password: "Winnipeg2025" # XML-RPC password
acl:
enabled: true # Enable ACL
allowedCidrs: # Allowed IP ranges
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
campaign:
url: "http://freeswitch-campaign.demo-client.svc.cluster.local:8080/RPC2"
username: "fsadmin"
password: "Winnipeg2025"
OpenAI Configuration (for IVR TTS)
openai:
apiKey: "sk-..." # OpenAI API key
ttsModel: "tts-1" # TTS model (tts-1, tts-1-hd)
defaultVoice: "alloy" # Default voice (alloy, echo, fable, onyx, nova, shimmer)
RBAC Configuration
rbac:
enabled: true # Enable RBAC resources
serviceAccount:
create: true # Create ServiceAccount
name: "" # ServiceAccount name (auto-generated if empty)
annotations: {} # ServiceAccount annotations
Image Pull Secrets
imagePullSecrets:
enabled: true # Enable image pull secret
name: ghcr-secret # Secret name
username: mattjoubert # GitHub username
password: ghp_... # GitHub token (with packages:read scope)
email: github@mattjoubert.com # Email address
server: ghcr.io # Registry server
Multi-Tenant Isolation
Ominis Cluster Manager achieves strong multi-tenant isolation through several Kubernetes mechanisms:
1. Kubernetes Namespaces
Pattern: One namespace per tenant (client-{tenant-name})
Benefits:
- ✅ Logical separation of resources
- ✅ RBAC boundaries (ServiceAccount permissions scoped to namespace)
- ✅ Resource quotas per tenant
- ✅ Network policies per tenant
- ✅ Clear ownership (all resources in namespace belong to tenant)
Example:
# Tenant A
kubectl get all -n client-demo-client
# Tenant B
kubectl get all -n client-acme-corp
2. Network Policies
Purpose: Restrict network traffic between tenants
Example Policy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tenant-isolation
namespace: client-demo-client
spec:
podSelector: {} # Apply to all pods in namespace
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
tenant: demo-client # Only from same tenant
egress:
- to:
- namespaceSelector:
matchLabels:
tenant: demo-client # Only to same tenant
3. Resource Quotas
Purpose: Limit resource consumption per tenant
Example Quota:
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-quota
namespace: client-demo-client
spec:
hard:
requests.cpu: "10" # Max 10 CPU cores requested
requests.memory: 20Gi # Max 20Gi memory requested
limits.cpu: "20" # Max 20 CPU cores limit
limits.memory: 40Gi # Max 40Gi memory limit
persistentvolumeclaims: "10" # Max 10 PVCs
services.loadbalancers: "2" # Max 2 load balancers
4. Separate Databases
Approach: Each tenant has a dedicated PostgreSQL instance
Benefits:
- ✅ Complete data isolation (no shared tables)
- ✅ Independent backups and restores
- ✅ Per-tenant performance tuning
- ✅ No noisy neighbor issues
- ⚠️ Trade-off: Higher resource usage vs. security
5. Isolated Secrets
Approach: Secrets are namespace-scoped
Benefits:
- ✅ API keys, passwords, tokens isolated per tenant
- ✅ No cross-tenant secret access
- ✅ ServiceAccount can only read secrets in its namespace
Example:
# Tenant A secrets
kubectl get secrets -n client-demo-client
# Tenant B secrets (different secrets)
kubectl get secrets -n client-acme-corp
Tenant Isolation Diagram
Deployment Examples
Example 1: Preview Deployment Changes
See what will change without applying:
cd /home/matt/projects/fml/cluster-manager
# Preview changes using Makefile
make helm-diff TENANT=demo-client VALUES=values.yaml
# Or use helm directly
helm template demo-client charts/tenant-infra \
-f values.yaml \
--namespace client-demo-client > /tmp/demo-client.yaml
kubectl diff -n client-demo-client -f /tmp/demo-client.yaml || true
Output:
+ apiVersion: v1
+ kind: Namespace
+ metadata:
+ name: client-demo-client
...
+ apiVersion: apps/v1
+ kind: Deployment
+ metadata:
+ name: api
+ namespace: client-demo-client
...
Use Cases:
- ✅ Verify configuration before applying
- ✅ Review resource changes during upgrades
- ✅ CI/CD pull request previews
Example 2: Deploy New Tenant
Deploy a new tenant from scratch:
# Step 1: Create values file
cat > tenant-acme.yaml <<EOF
tenant:
name: acme-corp
api:
host: api-acme.example.com
api_key: "acme-secret-key"
postgres:
password: "acme-db-password"
imagePullSecrets:
enabled: true
username: your-github-username
password: ghp_your_github_token
EOF
# Step 2: Deploy using Makefile
make helm-apply TENANT=acme-corp VALUES=tenant-acme.yaml
# Step 3: Watch deployment progress
kubectl get pods -n client-acme-corp -w
# Step 4: Verify deployment
helm status acme-corp -n client-acme-corp
kubectl get all,ing -n client-acme-corp
Timeline:
- Namespace creation: ~1s
- ConfigMaps/Secrets: ~2s
- PostgreSQL pod ready: ~10-15s
- API pod ready: ~5-10s
- Registrar pod ready: ~10-15s
- Total: ~30-45 seconds
Example 3: Check Deployment Status
Check the status of a deployed tenant:
# List all Helm releases
helm list -A
# Get release details
helm status demo-client -n client-demo-client
# View deployed values
helm get values demo-client -n client-demo-client
# View rendered manifests
helm get manifest demo-client -n client-demo-client
# Check pod status
kubectl get pods -n client-demo-client
# Check ingress
kubectl get ingress -n client-demo-client
kubectl describe ingress api -n client-demo-client
# Check logs
kubectl logs -l app=api -n client-demo-client --tail=100 -f
Expected Output:
NAME: demo-client
LAST DEPLOYED: Mon Oct 14 10:00:00 2025
NAMESPACE: client-demo-client
STATUS: deployed
REVISION: 1
Example 4: Customize Values Per Environment
Use different values for dev, staging, and production:
# Development environment
cat > values-dev.yaml <<EOF
tenant:
name: demo-client
labels:
environment: dev
api:
replicas: 1
host: api-dev.demo.example.com
tlsEnabled: false # No TLS in dev
debug: "true"
postgres:
persistence:
size: 5Gi # Smaller storage in dev
freeswitch:
registrar:
resources:
limits:
memory: "512Mi" # Lower limits in dev
EOF
# Production environment
cat > values-prod.yaml <<EOF
tenant:
name: demo-client
labels:
environment: prod
api:
replicas: 3 # HA in production
host: api.demo.example.com
tlsEnabled: true
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
debug: "false"
postgres:
persistence:
size: 50Gi # Larger storage in prod
storageClass: "fast-ssd"
freeswitch:
registrar:
replicas: 2 # HA registrar
resources:
limits:
memory: "2Gi"
EOF
# Deploy to dev
make helm-apply TENANT=demo-client-dev VALUES=values-dev.yaml
# Deploy to prod
make helm-apply TENANT=demo-client-prod VALUES=values-prod.yaml
Example 5: Rollback Failed Deployment
Rollback to a previous working version:
# List release history
helm history demo-client -n client-demo-client
# Example output:
# REVISION UPDATED STATUS CHART DESCRIPTION
# 1 Mon Oct 14 10:00:00 2025 superseded tenant-infra-1.0.0 Install complete
# 2 Mon Oct 14 11:00:00 2025 superseded tenant-infra-1.1.0 Upgrade complete
# 3 Mon Oct 14 12:00:00 2025 failed tenant-infra-1.2.0 Upgrade failed
# Rollback to previous version (revision 2)
helm rollback demo-client -n client-demo-client
# Or rollback to specific revision
helm rollback demo-client 2 -n client-demo-client
# Verify rollback
kubectl get pods -n client-demo-client -w
helm history demo-client -n client-demo-client
Rollback Process:
- Helm reverts to previous manifest
- Kubernetes applies changes
- Pods restart with old configuration
- Timeline: ~30-60 seconds
Upgrade Strategy
Helm Chart Upgrades
When upgrading the tenant infrastructure chart:
# Step 1: Pull latest changes
git pull origin main
# Step 2: Review chart changes
git log --oneline charts/tenant-infra/
# Step 3: Preview upgrade
make helm-diff TENANT=demo-client VALUES=values.yaml
# Step 4: Apply upgrade
make helm-apply TENANT=demo-client VALUES=values.yaml
# Step 5: Monitor rollout
kubectl get pods -n client-demo-client -w
Application Upgrades (API Image)
When upgrading the API application:
# Step 1: Build and push new image
make build TAG=v1.2.0
docker push ghcr.io/ominis-ai/cm-api:v1.2.0
# Step 2: Update values.yaml
sed -i 's/tag: latest/tag: v1.2.0/' values.yaml
# Step 3: Apply upgrade
make helm-apply TENANT=demo-client VALUES=values.yaml
# Step 4: Monitor rollout
kubectl rollout status deployment/api -n client-demo-client
Zero-Downtime Upgrades
For zero-downtime upgrades, use RollingUpdate strategy:
api:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Max pods above desired count during update
maxUnavailable: 0 # Min pods available during update
Process:
- New pod starts
- New pod becomes ready
- Old pod terminates
- Result: Always at least 1 pod serving traffic
Context & Rationale
Why One Chart Per Tenant?
Decision: Deploy one Helm release per tenant
Benefits:
✅ Complete Isolation: Each tenant has dedicated resources
✅ Independent Upgrades: Upgrade one tenant without affecting others
✅ Per-Tenant Customization: Different resource limits, features, configurations
✅ Clear Resource Ownership: All resources in namespace belong to tenant
✅ Simplified Multi-Tenancy: Easy to add/remove tenants
Trade-offs:
⚠️ Resource Overhead: Each tenant has dedicated PostgreSQL, API, Registrar
⚠️ Operational Complexity: More releases to manage
Mitigation:
- Use resource limits to cap per-tenant usage
- Automate tenant deployment via CI/CD
- Monitor all tenants centrally with Prometheus
Why StatefulSet for PostgreSQL?
Decision: Use Deployment (not StatefulSet) for PostgreSQL
Note: The chart currently uses Deployment for PostgreSQL, not StatefulSet. This is acceptable for single-replica databases, but StatefulSet is recommended for HA deployments.
StatefulSet Benefits:
- ✅ Stable pod identity (
postgres-0,postgres-1) - ✅ Ordered deployment and scaling
- ✅ Persistent storage guarantees
- ✅ Graceful scaling and termination
- ✅ Reliable restarts (same PVC every time)
When to Use StatefulSet:
- Multi-replica databases (primary + replicas)
- Databases requiring stable network identity
- Databases with ordered scaling requirements
Current Approach (Deployment):
- ✅ Simpler for single-replica databases
- ✅ Sufficient for non-HA deployments
- ⚠️ Not suitable for HA PostgreSQL
Why Separate Registrar?
Decision: Deploy one registrar per tenant (not shared)
Benefits:
✅ Tenant Isolation: Registrar failures don't cascade across tenants
✅ Per-Tenant Configuration: Different SIP profiles per tenant
✅ Simplified Debugging: Registrar logs map 1:1 to tenant
✅ Security: Network policies isolate registrar per tenant
Trade-offs:
⚠️ Resource Overhead: Each tenant has dedicated registrar pod
⚠️ Port Management: Each registrar needs dedicated ports (if not using host networking)
Alternative (Shared Registrar):
- One registrar for all tenants
- ❌ Single point of failure
- ❌ Configuration complexity (multi-tenant SIP profiles)
- ❌ Security concerns (tenant boundary violations)
Configuration Management Best Practices
Decision: Store configuration in version control
Approach:
- values.yaml: Committed to git (without secrets)
- Secrets: Stored in external secret manager (e.g., Vault, AWS Secrets Manager)
- Environment-Specific Values: Separate files (
values-dev.yaml,values-prod.yaml)
Example:
# values.yaml (committed to git)
tenant:
name: demo-client
api:
host: api.demo.example.com
api_key: "" # Injected via CI/CD
# secrets.env (NOT committed to git)
API_KEY=secret-key-here
DB_PASSWORD=db-password-here
OPENAI_API_KEY=sk-...
CI/CD Integration:
# Inject secrets during deployment
helm upgrade --install demo-client charts/tenant-infra \
-f values.yaml \
--set api.api_key=$API_KEY \
--set postgres.password=$DB_PASSWORD \
--set openai.apiKey=$OPENAI_API_KEY
Links to Related Sections
- System Overview - High-level architecture and components
- Cluster Infrastructure - Cluster-wide services (cert-manager, traefik, etc.)
- Kubernetes Operations - Operational runbooks for Kubernetes
- Database Schema - PostgreSQL tables and relationships
- Queue Management - Dynamic queue pod deployment
- API Authentication - API key management and security
- Ports & Adapters - Hexagonal architecture pattern
- Testing Strategy - Helm chart testing approach
Summary
Ominis Cluster Manager uses Helm for infrastructure scaffolding (namespace, RBAC, secrets, ingress) and Python Kubernetes client for runtime resources (queues, IVRs). This provides the best of both worlds:
✅ Helm: Standard, versioned, rollback-friendly infrastructure deployment
✅ Kubernetes Client: Fast, simple, idempotent runtime orchestration
✅ Multi-Tenant: Strong isolation via namespaces, network policies, and resource quotas
✅ Developer Experience: Simple make helm-apply command deploys complete tenant
The migration from Terraform to Helm reduced deployment complexity, improved performance (15-30s → 3-5s for queues), and aligned with Kubernetes-native best practices.
Powered by Ominis.ai - Modern call center infrastructure for cloud-native platforms.