Skip to main content

System Overview & Architecture

Introduction

What is Ominis Cluster Manager?

Ominis Cluster Manager is a Kubernetes-native call center control platform that transforms FreeSWITCH into a programmable, cloud-native telephony system. It provides a modern REST API for managing call center queues, SIP extensions, IVR flows, and real-time call control operations.

Instead of managing monolithic FreeSWITCH instances with shared state, Ominis deploys one FreeSWITCH pod per queue, providing complete isolation, independent scaling, and fault tolerance.

Powered by Ominis.ai

Who is it for?

Ominis Cluster Manager is designed for:

  • Platform Builders: Teams building call center platforms that need programmatic control
  • DevOps Engineers: Operations teams managing telephony infrastructure at scale
  • SaaS Providers: Multi-tenant contact center platforms requiring isolation and security
  • Enterprise IT: Organizations modernizing legacy PBX systems with cloud-native architecture

What problems does it solve?

Traditional call center systems suffer from:

  1. Monolithic Architecture: Single FreeSWITCH instance becomes a single point of failure
  2. Shared State: Queue failures cascade across the system
  3. Manual Configuration: XML file editing and service restarts for changes
  4. Limited Isolation: No tenant separation or resource guarantees
  5. Operational Complexity: Difficult to scale, debug, and monitor individual queues

Ominis Cluster Manager solves these with:

  1. Container-Per-Queue Model: Complete isolation and independent lifecycle management
  2. REST API: Programmatic control over all telephony operations
  3. Kubernetes-Native: Cloud-native deployment with auto-scaling and self-healing
  4. Database-Driven Configuration: Dynamic changes without service restarts
  5. Observability: Prometheus metrics and structured logging for all operations

Key Value Propositions

Cloud-Native Architecture: Built for Kubernetes from day one
Complete API Coverage: 100+ REST endpoints for all operations
Multi-Tenant Ready: Resource isolation and security boundaries
Production-Grade: Battle-tested patterns with comprehensive testing
Developer Experience: OpenAPI documentation, type safety, and modern tooling


High-Level Architecture

Ominis Cluster Manager follows a ports and adapters (hexagonal architecture) pattern, separating business logic from infrastructure concerns. The system consists of three main layers:

Architecture Layers

System Architecture

Data Flow Between Components

Configuration Flow:

  1. Client sends REST API request to create/update resource
  2. API validates request and stores configuration in PostgreSQL
  3. API orchestrates Kubernetes to create/update pod
  4. FreeSWITCH pod loads configuration via ODBC on startup

Call Control Flow:

  1. Client sends call control command via REST API
  2. API translates to FreeSWITCH XML-RPC command
  3. XML-RPC client executes command on target pod
  4. FreeSWITCH returns result via XML-RPC
  5. API returns result to client as JSON

Directory Lookup Flow:

  1. FreeSWITCH receives SIP REGISTER/INVITE
  2. mod_xml_curl sends HTTP POST to API directory endpoint
  3. API queries PostgreSQL for extension configuration
  4. API returns XML directory/dialplan response
  5. FreeSWITCH authenticates/routes call based on XML

Core Components

1. API Service (FastAPI)

Purpose: REST API gateway for all telephony operations

Key Features:

  • 100+ REST Endpoints: Complete coverage of queue, extension, IVR, and call control operations
  • OpenAPI Documentation: Interactive Swagger UI at /docs
  • API Key Authentication: X-API-Key header validation
  • Prometheus Metrics: Endpoint latency, request counts, error rates
  • Async Processing: Non-blocking I/O with Python asyncio
  • Structured Logging: JSON logs for centralized collection

Technology:

  • Python 3.11
  • FastAPI (async web framework)
  • Pydantic (type validation)
  • Uvicorn (ASGI server)

2. Queue Pods (FreeSWITCH + mod_callcenter)

Purpose: Isolated call center queue instances

One-Pod-Per-Queue Model:

  • Each queue runs in dedicated FreeSWITCH container
  • Complete resource isolation (CPU, memory, network)
  • Independent scaling and lifecycle management
  • Failure isolation (one queue down ≠ all queues down)

Configuration:

  • Database: PostgreSQL ODBC for dynamic configuration
  • Agents: SIP endpoints defined in cc_agents table
  • Tiers: Agent-to-queue assignments with priority/position
  • Members: Callers waiting in queue (FIFO management)

Queue Strategies:

  • ring-all - Ring all available agents
  • longest-idle-agent - Route to agent idle longest
  • round-robin - Distribute evenly across agents
  • top-down - Ring agents by tier order
  • agent-with-least-talk-time - Balance call time
  • agent-with-fewest-calls - Balance call count

3. IVR Pods (FreeSWITCH + ESL Socket Handler)

Purpose: Interactive Voice Response menu systems

Architecture:

  • One Pod Per IVR: Isolated IVR instances
  • Database-Driven Menus: Menu structure stored in PostgreSQL
  • OpenAI TTS Integration: High-quality text-to-speech with caching
  • ESL Socket Handler: Python script handles menu logic via Event Socket
  • Supervisord: Process manager for FreeSWITCH + Socket Handler

Supported Actions:

  • Transfer to queue/extension
  • Sub-menu navigation
  • HTTP API calls (webhook integration)
  • Audio playback
  • Voicemail routing
  • Call hangup

4. Campaign Pod (Outbound Dialer)

Purpose: Dedicated FreeSWITCH instance for outbound campaigns

Features:

  • XML-RPC Interface: Programmatic call origination
  • Campaign Management: Contact list upload and processing
  • Progress Tracking: Real-time campaign metrics
  • Call Pacing: Configurable dialing rate
  • Answer Detection: AMD (Answering Machine Detection) support

Campaign Types:

  • Progressive Dialer: One call per available agent
  • Predictive Dialer: Multiple calls per agent (abandoned call mitigation)
  • Preview Dialer: Agent reviews contact before dialing

5. Registrar Pod (SIP Registration & B2BUA)

Purpose: SIP registration server and media anchor

Key Features:

Hybrid Authentication:

  • Cluster IPs (10.x.x.x): Blind registration (ACL-based trust)
  • External IPs: mod_xml_curl → API → PostgreSQL lookup

B2BUA Pattern:

  • Queue originates call to sofia/gateway/registrar/agent-XXXX
  • Registrar answers with internal IP
  • Registrar originates to agent with public IP (51.79.31.20)
  • Registrar bridges both legs
  • Result: NAT/media anchoring between internal pods and external SIP clients

Public IP Advertisement:

  • ext-rtp-ip: 51.79.31.20 for external connectivity
  • Media anchoring for all RTP streams
  • ICE disabled on queue side to prevent candidate conflicts

6. PostgreSQL (Configuration Store)

Purpose: Centralized configuration and state storage

Key Tables:

  • queues - Queue definitions and settings
  • cc_agents - Agent definitions and status
  • cc_tiers - Agent-to-queue assignments
  • cc_members - Active callers in queue
  • extensions - SIP user extensions
  • ivrs - IVR pod configurations
  • ivr_menus - IVR menu definitions
  • ivr_menu_options - Menu option actions
  • ivr_tts_cache - Cached TTS audio files

Connection Method:

  • API: SQLAlchemy async (asyncpg driver)
  • FreeSWITCH Pods: ODBC (unixODBC + psqlODBC)

7. Kubernetes (Orchestration Layer)

Purpose: Container orchestration and service discovery

Resources:

  • Deployments: Stateless pods (API, Registrar, Campaign)
  • StatefulSets: Stateful pods (Queue, IVR) with stable network identity
  • Services: Internal DNS (e.g., queue-sales.client-demo-client.svc.cluster.local)
  • ConfigMaps: Dynamic configuration injection
  • Secrets: Credentials and API keys
  • Ingress: External HTTPS access via Traefik

Namespace Model:

  • One namespace per tenant (e.g., client-demo-client)
  • Resource quotas and network policies
  • Complete isolation between tenants

Deployment Modes

Ominis Cluster Manager supports two deployment modes:

Kubernetes (Production)

Recommended for: Production deployments, multi-tenant platforms, auto-scaling

Advantages:

  • Auto-Scaling: Horizontal pod autoscaling (HPA) based on CPU/memory
  • Self-Healing: Automatic pod restart on failure
  • Service Discovery: DNS-based routing (e.g., queue-sales.namespace.svc.cluster.local)
  • Load Balancing: Built-in service load balancing
  • Rolling Updates: Zero-downtime deployments
  • Resource Limits: CPU/memory quotas per pod
  • Multi-Tenancy: Namespace isolation

Configuration:

DEPLOYMENT_MODE=kubernetes
KUBERNETES_NAMESPACE=client-demo-client

Deployment:

make helm-apply

Docker (Development)

Recommended for: Local development, testing, single-node deployments

Advantages:

  • Simplicity: No Kubernetes cluster required
  • Fast Iteration: Quick container rebuild and restart
  • Local Testing: Test on laptop before cloud deployment
  • Debugging: Direct container logs and shell access

Configuration:

DEPLOYMENT_MODE=docker

Deployment:

make build
docker-compose up

Comparison Matrix

FeatureKubernetesDocker
Auto-Scaling✅ Yes (HPA)❌ No
Self-Healing✅ Yes (ReplicaSets)❌ No
Multi-Tenancy✅ Yes (Namespaces)⚠️ Manual
Service Discovery✅ DNS (e.g., queue-sales.ns.svc)⚠️ Manual
Load Balancing✅ Built-in❌ Manual
Rolling Updates✅ Yes❌ Manual
Setup Complexity⚠️ Medium (K8s cluster)✅ Low (Docker only)
Local Development⚠️ Requires Minikube/Kind✅ Native

ADR-0001: Why Container-Per-Queue?

Context

Traditional call center systems deploy a single FreeSWITCH instance with multiple queues sharing the same process. This creates shared fate: if one queue has a configuration error, consumes excessive resources, or crashes, all queues are affected.

Problem: How do we achieve fault isolation, independent scaling, and multi-tenancy in a call center platform?

Decision

Deploy one FreeSWITCH pod per queue using Kubernetes StatefulSets.

Each queue gets:

  • Dedicated FreeSWITCH process
  • Isolated resources (CPU, memory, network)
  • Independent lifecycle (create, update, delete)
  • Stable DNS name (e.g., queue-sales.client-demo-client.svc.cluster.local)

Alternatives Considered

1. Shared FreeSWITCH Instance

  • ❌ Single point of failure
  • ❌ No resource isolation
  • ❌ Configuration changes affect all queues
  • ❌ Difficult to debug individual queue issues

2. One Pod Per Tenant (Multiple Queues)

  • ⚠️ Better than shared, but still coupled
  • ❌ Queue failures cascade within tenant
  • ⚠️ Scaling granularity at tenant level, not queue level

3. Container-Per-Queue (Chosen)

  • ✅ Complete fault isolation
  • ✅ Independent scaling (scale hot queues, not cold ones)
  • ✅ Clear resource attribution
  • ✅ Simplified debugging (one queue = one pod)
  • ⚠️ Resource overhead (more containers)

Consequences

Positive:

  • Fault Isolation: Queue-sales crash doesn't affect queue-support
  • Independent Scaling: Scale each queue based on its load
  • Clear Debugging: Pod logs map 1:1 to queue issues
  • Multi-Tenancy: Strong isolation boundaries
  • Resource Limits: Set CPU/memory per queue
  • Security: Network policies per queue
  • Simplified Rollbacks: Roll back one queue, not entire system

Negative:

  • ⚠️ Resource Overhead: Each pod requires base FreeSWITCH memory (~50-100MB)
  • ⚠️ Operational Complexity: More pods to monitor
  • ⚠️ Startup Time: Creating queue = starting new container (~5-10s)

Mitigation:

  • Use Kubernetes resource limits to cap memory usage
  • Leverage Prometheus for centralized monitoring
  • Pre-warm container images to reduce startup time

Status:Accepted - In production use

Date: 2024-01-15


ADR-0002: Why XML-RPC Over ESL?

Context

FreeSWITCH offers two primary programmatic interfaces:

  1. Event Socket Library (ESL): TCP socket connection for real-time events and commands
  2. XML-RPC: HTTP-based RPC interface for synchronous command execution

Problem: Which interface should Ominis Cluster Manager use for call control operations?

Decision

Use XML-RPC for write operations (commands), and PostgreSQL for read operations (state).

Rationale:

  1. Simpler Request/Response Model: XML-RPC is synchronous HTTP - no connection pooling, no event parsing
  2. Better Performance for Reads: Database queries are 10-50x faster than ESL show commands
  3. Separation of Concerns: Commands via XML-RPC, state queries via database
  4. Stateless: No persistent connections to manage or recover
  5. HTTP Ecosystem: Leverage existing HTTP tools (retries, timeouts, observability)

Alternatives Considered

1. ESL (Event Socket Library)

  • ✅ Real-time events (call start, end, DTMF)
  • ✅ Full Ominis Cluster Manager access
  • ❌ Persistent TCP connection (connection pooling complexity)
  • ❌ Event parsing overhead
  • ❌ Connection recovery logic
  • ❌ Slower for state queries (show channels)

2. XML-RPC (Chosen)

  • ✅ Simple HTTP request/response
  • ✅ Stateless (no connection management)
  • ✅ Easy retries and timeouts
  • ✅ Works with existing HTTP libraries
  • ❌ No real-time events (must poll or use database triggers)
  • ⚠️ Limited to synchronous commands

3. Hybrid: ESL for Events, XML-RPC for Commands

  • ✅ Best of both worlds
  • ❌ Operational complexity (two connection types)
  • ❌ Overkill for current use case

4. Database-Only

  • ✅ Fast reads
  • ❌ Cannot issue commands (originate, hangup, transfer)
  • ❌ Not a viable option

Consequences

Positive:

  • Simplicity: HTTP request/response model (no connection pooling)
  • Performance: Database reads are 10-50x faster than ESL show commands
  • Reliability: Stateless (no connection recovery logic)
  • Observability: HTTP metrics (latency, errors, retries)
  • Developer Experience: Easier to debug and test

Negative:

  • ⚠️ No Real-Time Events: Cannot receive call events in real-time
    • Mitigation: Use database polling or PostgreSQL LISTEN/NOTIFY for state changes
  • ⚠️ Less Feature-Rich: Some ESL-only commands unavailable
    • Mitigation: XML-RPC covers 95% of use cases; use ESL directly for advanced scenarios

Trade-Offs:

  • Read Operations: Database > XML-RPC > ESL (in terms of performance)
  • Write Operations: XML-RPC ≈ ESL (both synchronous)
  • Events: ESL only (but not needed for current use case)

Status:Accepted - In production use

Date: 2024-01-15


Technology Stack

API Layer

ComponentTechnologyPurpose
Web FrameworkFastAPIAsync REST API framework
ValidationPydanticType safety and validation
ASGI ServerUvicornProduction web server
LanguagePython 3.11Modern async Python
API DocumentationOpenAPI 3.0Auto-generated docs

Database Layer

ComponentTechnologyPurpose
Primary DBPostgreSQL 15Configuration and state
ORMSQLAlchemy (async)Database abstraction
DriverasyncpgHigh-performance async driver
ODBCunixODBC + psqlODBCFreeSWITCH ODBC integration

Telephony Layer

ComponentTechnologyPurpose
PBX EngineFreeSWITCHCore telephony processing
Call Centermod_callcenterQueue and agent management
XML-RPCmod_xml_rpcProgrammatic control
Directorymod_xml_curlDynamic user provisioning
ESLEvent Socket LibraryIVR socket handler

Orchestration Layer

ComponentTechnologyPurpose
Container RuntimeDockerContainer packaging
OrchestrationKubernetesProduction orchestration
Package ManagementHelmKubernetes deployments
IngressTraefikHTTPS ingress controller
RegistryGitHub Container RegistryContainer images

Observability

ComponentTechnologyPurpose
MetricsPrometheusTime-series metrics
LoggingStructured JSON logsCentralized logging
Tracing(Future) OpenTelemetryDistributed tracing

Request Flow Example

This sequence diagram shows the complete request flow for creating a new queue:

Step-by-Step Breakdown

1. Client Request

  • Client sends POST /v1/queues with queue configuration
  • Request includes X-API-Key header for authentication

2. Authentication

  • API key middleware validates header against API_KEY environment variable
  • Returns 401 Unauthorized if invalid

3. Database Storage

  • API writes queue configuration to PostgreSQL queues table
  • Configuration includes: name, strategy, tier rules, timeouts, etc.

4. Kubernetes Orchestration

  • Orchestrator creates Kubernetes resources:
    • ConfigMap: FreeSWITCH environment variables
    • Service: Internal DNS (e.g., queue-sales.namespace.svc.cluster.local)
    • StatefulSet: Deploys FreeSWITCH pod with stable identity

5. Pod Initialization

  • FreeSWITCH container starts
  • Loads environment variables from ConfigMap
  • Connects to PostgreSQL via ODBC
  • Loads mod_callcenter configuration from database
  • Reports ready to Kubernetes

6. Response

  • API returns 201 Created with queue details
  • Client can now add agents, tiers, and route calls to queue

Typical Response Times

  • Database Insert: 5-10ms
  • Kubernetes Resource Creation: 100-500ms
  • Pod Startup: 3-5 seconds
  • Total Request: 4-6 seconds

Examples

Example 1: Health Check

Check if Ominis API is running:

curl -X GET http://localhost:8000/health

Response:

{
"status": "healthy",
"service": "callcenter-api"
}

Use Cases:

  • Load balancer health checks
  • Monitoring system probes
  • Container readiness checks

Example 2: Get Branding Info

Retrieve branding information:

curl -X GET http://localhost:8000/v1/branding \
-H "X-API-Key: demo"

Response:

{
"brand": "Ominis AI",
"poweredBy": "Ominis.ai"
}

HTTP Headers:

X-Powered-By: Ominis.ai

Example 3: Create Queue

Create a new sales queue with longest-idle-agent strategy:

curl -X POST http://localhost:8000/v1/queues \
-H "X-API-Key: demo" \
-H "Content-Type: application/json" \
-d '{
"name": "sales",
"strategy": "longest-idle-agent",
"max_wait_time": 300,
"max_wait_time_no_agent": 120,
"tier_rule_wait": true,
"tier_rule_wait_multiply": true,
"tier_rule_no_agent_no_wait": false,
"announce_position": true,
"announce_holdtime": true
}'

Response:

{
"name": "sales",
"strategy": "longest-idle-agent",
"max_wait_time": 300,
"max_wait_time_no_agent": 120,
"status": "ready",
"created_at": "2024-01-15T10:30:00Z"
}

What Happens:

  1. Queue configuration saved to PostgreSQL
  2. Kubernetes StatefulSet created
  3. FreeSWITCH pod starts with queue configuration
  4. Service DNS available at queue-sales.client-demo-client.svc.cluster.local

Example 4: Access Interactive API Documentation

Open your browser to view interactive Swagger UI:

http://localhost:8000/docs

Features:

  • Try all 100+ endpoints directly in browser
  • View request/response schemas
  • Generate curl commands
  • Authentication with API key
  • Download OpenAPI spec

Alternative Documentation:

  • ReDoc: http://localhost:8000/redoc
  • OpenAPI JSON: http://localhost:8000/openapi.json

Context & Rationale

Why Kubernetes-Native?

Decision: Build on Kubernetes instead of traditional VM-based deployment.

Rationale:

  1. Dynamic Pod Management: Create/delete queues as Kubernetes resources
  2. Auto-Scaling: Horizontal Pod Autoscaler (HPA) scales based on CPU/memory
  3. Service Discovery: DNS-based routing (no hardcoded IPs)
  4. Resource Isolation: CPU/memory limits per pod
  5. Self-Healing: Automatic pod restart on failure
  6. Cloud-Native: Runs on any Kubernetes cluster (GKE, EKS, AKS, on-prem)
  7. Declarative Configuration: Infrastructure as code (Helm charts)

Trade-Offs:

  • ⚠️ Requires Kubernetes cluster (operational overhead)
  • ⚠️ Learning curve for Kubernetes concepts
  • ✅ But: Production-grade reliability and scalability

Why FastAPI?

Decision: Use FastAPI instead of Flask, Django, or Node.js.

Rationale:

  1. Modern Async Python: Native async/await support (non-blocking I/O)
  2. Automatic OpenAPI Generation: /docs endpoint out of the box
  3. Type Safety: Pydantic models for request/response validation
  4. High Performance: Comparable to Node.js and Go
  5. Developer Experience: Fast iteration, excellent error messages
  6. Ecosystem: Rich ecosystem of async libraries (asyncpg, httpx, etc.)

Comparison:

FrameworkAsyncOpenAPIType SafetyPerformance
FastAPI✅ Auto✅ Pydantic⚡ Fast
Flask⚠️ Manual🐢 Slower
Django⚠️ Limited⚠️ DRF⚠️ DRF🐢 Slower
Node.js⚠️ Manual⚠️ TypeScript⚡ Fast

Why PostgreSQL?

Decision: Use PostgreSQL instead of MySQL, MongoDB, or SQLite.

Rationale:

  1. ACID Compliance: Strong consistency guarantees for call center state
  2. JSON Support: Flexible schemas for IVR menus and metadata
  3. ODBC Integration: Native ODBC support for FreeSWITCH mod_callcenter
  4. Battle-Tested: 30+ years of production use
  5. Rich Queries: Complex joins, CTEs, window functions
  6. Extensions: PostGIS, pg_trgm, pg_stat_statements
  7. Replication: Streaming replication for high availability

FreeSWITCH Integration:

  • mod_callcenter reads/writes directly to PostgreSQL via ODBC
  • No synchronization lag (database is source of truth)
  • No dual-write consistency issues

Why One-Pod-Per-Queue?

See ADR-0001 above.

Summary:

  • ✅ Fault isolation (one queue failure doesn't cascade)
  • ✅ Independent scaling (scale hot queues separately)
  • ✅ Clear debugging (one queue = one pod)
  • ⚠️ Resource overhead (more containers)

Production Validation:

  • Running 50+ queues in production
  • Average pod memory: 75MB
  • Zero cross-queue failures in 6 months


Next Steps

Now that you understand the system architecture, explore:

  1. Getting Started Guide - Deploy your first queue
  2. Queue Management API - Learn queue operations
  3. Call Control API - Control active calls
  4. IVR System - Build interactive voice menus
  5. Campaign Management - Outbound dialer campaigns

Powered by Ominis.ai - Modern call center infrastructure for cloud-native platforms.