Cluster Infrastructure Services
Ominis Cluster Manager uses a two-tier infrastructure model: cluster-wide services (shared) and tenant services (isolated). This document provides a deep dive into the cluster infrastructure layer that powers all tenant deployments.
Introduction
Cluster vs Tenant Infrastructure Model
The Ominis platform uses a two-tier infrastructure approach to balance efficiency, isolation, and operational simplicity:
Cluster Infrastructure (this document):
- Deployed once per Kubernetes cluster
- Shared by all tenants
- Namespaces:
cert-manager,authentik,vaultwarden,flow-proxy,homer,excalidraw - Examples: TLS certificate management, identity provider, HTTP ingress, SIP monitoring
- Repository:
cluster-infra
Tenant Infrastructure:
- Deployed per customer/tenant
- Isolated in dedicated namespaces
- Namespace pattern:
client-{tenant-name}(e.g.,client-demo-client) - Examples: API servers, queue pods, IVR pods, databases, tenant-specific ingress
- Repository:
cluster-manager
Why Separate?
This separation provides several key benefits:
| Benefit | Description |
|---|---|
| Cost Efficiency | One cert-manager instance serves all tenants instead of N instances |
| Operational Simplicity | Single upgrade, single monitoring stack, centralized configuration |
| Resource Optimization | Shared ingress controller reduces pod overhead |
| Faster Onboarding | New tenants start immediately without waiting for infrastructure |
| Security Isolation | Tenant workloads isolated via namespaces and network policies |
| Centralized Management | Identity, secrets, TLS certificates managed in one place |
This documentation focuses on cluster-level services. For tenant-specific infrastructure (API servers, queues, databases), see Helm Infrastructure Deployment.
Cluster Architecture Overview
The cluster infrastructure consists of six core services that work together to provide shared capabilities:
Service Dependencies
Understanding service dependencies is critical for deployment order and troubleshooting:
Key Dependencies:
- Cert-Manager is the foundation (no dependencies)
- Authentik requires cert-manager for TLS
- Vaultwarden requires cert-manager for TLS
- Flow-Proxy discovers ingress resources dynamically
- Homer and Excalidraw are independent (optional)
Service Catalog
1. Cert-Manager: Automated TLS Certificate Management
Purpose: Automated TLS certificate management using Let's Encrypt ACME protocol.
Architecture
Cert-Manager uses Kubernetes Custom Resource Definitions (CRDs) to automate certificate lifecycle:
- ClusterIssuer: Cluster-wide certificate authority (Let's Encrypt)
- Certificate: Defines a certificate request
- CertificateRequest: Generated automatically by Certificate
- Challenge: ACME HTTP-01 or DNS-01 validation
Certificate Lifecycle
Configuration
ClusterIssuer Definition:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# Let's Encrypt production server
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@ominis.ai
# Private key for ACME account
privateKeySecretRef:
name: letsencrypt-prod
solvers:
# HTTP-01 challenge using Traefik ingress
- http01:
ingress:
class: traefik
Certificate Request Example:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: demo-client-api-tls
namespace: client-demo-client
spec:
secretName: demo-client-api-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- demo-client-api.app.ominis.ai
Integration with Tenant Ingress
Tenants consume cert-manager via ingress annotations:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
namespace: client-demo-client
annotations:
# This annotation triggers cert-manager
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: traefik
tls:
- hosts:
- demo-client-api.app.ominis.ai
secretName: demo-client-api-tls # Cert-manager creates this
rules:
- host: demo-client-api.app.ominis.ai
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api
port:
number: 8000
Use Cases
| Use Case | Description |
|---|---|
| Tenant API Ingress | demo-client-api.app.ominis.ai automatically gets TLS |
| Documentation Site | docs.app.ominis.ai with Let's Encrypt certificate |
| Internal Services | Any service with ingress gets free TLS |
| Automatic Renewal | Certificates renewed 60 days before expiry |
Troubleshooting
Certificate not issued:
# Check certificate status
kubectl describe certificate demo-client-api-tls -n client-demo-client
# Check challenges
kubectl get challenges -n client-demo-client
# Check cert-manager logs
kubectl logs -n cert-manager -l app=cert-manager --tail=100
Common issues:
- DNS not configured: Ensure DNS points to cluster ingress IP
- HTTP-01 challenge failed: Check Traefik routing and firewall
- Rate limit exceeded: Let's Encrypt has rate limits (50 certs/week per domain)
2. Authentik: Identity and Access Management
Purpose: Enterprise-grade identity provider for SSO, OAuth, OIDC, LDAP, and SAML.
Architecture
Authentik is a flows-based authentication system:
- PostgreSQL Backend: Stores users, groups, applications
- Flows: Customizable authentication pipelines
- Providers: OAuth2/OIDC, SAML, LDAP
- Policies: Fine-grained access control
- Applications: Integrated services
Features
| Feature | Description |
|---|---|
| Single Sign-On | Users log in once, access multiple services |
| OAuth2/OIDC Provider | Standard protocol for API authentication |
| LDAP Server | Bridge to legacy systems |
| SAML Provider | Enterprise SSO integration |
| User Management | Self-service enrollment, password reset |
| Multi-Factor Auth | TOTP, WebAuthn, SMS |
Configuration Example
Helm Values:
authentik:
# PostgreSQL backend
postgresql:
enabled: true
persistence:
enabled: true
size: 10Gi
# Ingress configuration
ingress:
enabled: true
ingressClassName: traefik
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: auth.ominis.ai
paths:
- path: /
pathType: Prefix
tls:
- secretName: auth-ominis-ai-tls
hosts:
- auth.ominis.ai
# Secret key for session encryption
secret_key: "changeme-random-secret-key"
# Email configuration
email:
host: smtp.sendgrid.net
port: 587
username: apikey
password: "SG.xxx"
from: noreply@ominis.ai
Application Integration
Adding an Application:
- Create OAuth2/OIDC Provider in Authentik UI
- Configure redirect URIs
- Get client ID and secret
- Configure application to use Authentik
Example: Grafana Integration:
[auth.generic_oauth]
enabled = true
name = Authentik
client_id = grafana-client-id
client_secret = your-client-secret
scopes = openid profile email
auth_url = https://auth.ominis.ai/application/o/authorize/
token_url = https://auth.ominis.ai/application/o/token/
api_url = https://auth.ominis.ai/application/o/userinfo/
Use Cases
| Use Case | Description |
|---|---|
| Internal Tool SSO | Grafana, Homer UI, admin dashboards |
| API OAuth | Secure API endpoints with OAuth2 |
| LDAP Bridge | Connect legacy systems requiring LDAP |
| Multi-Tenant Isolation | User groups per tenant |
3. Vaultwarden: Password Management
Purpose: Self-hosted password manager compatible with Bitwarden clients.
Architecture
- Rust-based: Lightweight, efficient alternative to official Bitwarden
- SQLite/PostgreSQL: Flexible storage backends
- Client Compatible: Works with all Bitwarden clients (browser, mobile, CLI)
- API Compatible: REST API for automation
- Organizations: Team password sharing
Terraform Provider Integration
Vaultwarden includes a custom Terraform provider for infrastructure-as-code:
Provider Configuration:
terraform {
required_providers {
vaultwarden = {
source = "ominis-ai/vaultwarden"
version = "~> 1.0"
}
}
}
provider "vaultwarden" {
endpoint = "https://vault.ominis.ai"
email = "admin@ominis.ai"
password = var.admin_password
}
Manage Organizations:
resource "vaultwarden_organization" "platform" {
name = "Platform Team"
}
resource "vaultwarden_organization_collection" "credentials" {
organization_id = vaultwarden_organization.platform.id
name = "Production Credentials"
}
resource "vaultwarden_login_item" "database" {
organization_id = vaultwarden_organization.platform.id
collection_id = vaultwarden_organization_collection.credentials.id
name = "PostgreSQL Root"
username = "postgres"
password = random_password.db_password.result
uris = ["postgres://postgres.prod.svc.cluster.local:5432"]
notes = "Production database root credentials"
}
Deployment Configuration
Helm Values:
vaultwarden:
# Image
image:
repository: vaultwarden/server
tag: latest
# Persistence
persistence:
enabled: true
size: 1Gi
# Ingress
ingress:
enabled: true
className: traefik
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: vault.ominis.ai
paths:
- path: /
pathType: Prefix
tls:
- secretName: vault-ominis-ai-tls
hosts:
- vault.ominis.ai
# Environment variables
env:
DOMAIN: "https://vault.ominis.ai"
SIGNUPS_ALLOWED: "false"
INVITATIONS_ALLOWED: "true"
ADMIN_TOKEN: "changeme-admin-token"
Use Cases
| Use Case | Description |
|---|---|
| Team Credentials | Share API keys, passwords securely |
| API Key Management | Store tokens for external services |
| Certificate Storage | TLS certificates and private keys |
| SSH Key Management | Store SSH keys for server access |
| Infrastructure Secrets | Terraform-managed secret lifecycle |
Backup Strategy
Database Backup:
# Backup Vaultwarden data
kubectl exec -n vaultwarden vaultwarden-0 -- \
tar -czf - /data > vaultwarden-backup-$(date +%Y%m%d).tar.gz
# Upload to S3
aws s3 cp vaultwarden-backup-$(date +%Y%m%d).tar.gz \
s3://backups/vaultwarden/
4. Flow-Proxy: HTTP Reverse Proxy (Traefik)
Purpose: Dynamic HTTP reverse proxy and Kubernetes ingress controller.
Architecture
Traefik is a cloud-native ingress controller:
- Dynamic Configuration: Auto-discovers Kubernetes ingress resources
- Middleware System: Composable request/response transformations
- TLS Termination: Integrates with cert-manager
- Load Balancing: Round-robin, weighted, sticky sessions
- Observability: Metrics, tracing, access logs
Ingress Pattern
Standard Tenant Ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example-ingress
namespace: client-example
annotations:
# Cert-manager integration
cert-manager.io/cluster-issuer: "letsencrypt-prod"
# Traefik middleware (optional)
traefik.ingress.kubernetes.io/router.middlewares: "default-security-headers@kubernetescrd"
spec:
# Traefik ingress class
ingressClassName: traefik
tls:
- hosts:
- example-api.app.ominis.ai
secretName: example-api-tls
rules:
- host: example-api.app.ominis.ai
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api
port:
number: 8000
Middleware System
Traefik middleware enables request/response transformations:
Security Headers Middleware:
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: security-headers
namespace: default
spec:
headers:
customResponseHeaders:
X-Frame-Options: "SAMEORIGIN"
X-Content-Type-Options: "nosniff"
X-XSS-Protection: "1; mode=block"
Referrer-Policy: "no-referrer-when-downgrade"
Permissions-Policy: "geolocation=(), microphone=(), camera=()"
Rate Limiting Middleware:
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: rate-limit
namespace: default
spec:
rateLimit:
average: 100
burst: 50
period: 1m
Compression Middleware:
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: compress
namespace: default
spec:
compress: {}
Apply Middleware to Ingress:
metadata:
annotations:
traefik.ingress.kubernetes.io/router.middlewares: >-
default-security-headers@kubernetescrd,
default-rate-limit@kubernetescrd,
default-compress@kubernetescrd
Configuration
Helm Values:
flow-proxy:
# Deployment
deployment:
replicas: 2
# Service (LoadBalancer for external access)
service:
type: LoadBalancer
annotations:
# OVH LoadBalancer
service.beta.kubernetes.io/ovh-loadbalancer-flavor: "small"
# Ports
ports:
web:
port: 80
exposedPort: 80
websecure:
port: 443
exposedPort: 443
tls:
enabled: true
# TLS options
tlsOptions:
default:
minVersion: VersionTLS12
cipherSuites:
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
# Access logs
logs:
access:
enabled: true
format: json
# Metrics
metrics:
prometheus:
enabled: true
Use Cases
| Use Case | Description |
|---|---|
| Tenant API Routing | Route {tenant}-api.app.ominis.ai to tenant pods |
| Static Site Hosting | Serve documentation, landing pages |
| WebSocket Proxying | Real-time connections for IVR, monitoring |
| Path-Based Routing | /api → API service, /docs → docs service |
Dashboard Access
# Port-forward to dashboard
kubectl port-forward -n flow-proxy \
svc/traefik-dashboard 9000:9000
# Access dashboard
open http://localhost:9000/dashboard/
5. Homer: SIP Capture and VoIP Monitoring
Purpose: SIP protocol capture, analysis, and monitoring using HEP (Homer Encapsulation Protocol).
Architecture
Homer provides end-to-end VoIP monitoring:
- Heplify: Capture agent (DaemonSet on all nodes)
- Homer-App: Web UI for call flow analysis
- PostgreSQL: Timeseries storage for SIP messages
- HEP Protocol: Encapsulated SIP packet transport