System Overview & Architecture
Introduction
What is Ominis Cluster Manager?
Ominis Cluster Manager is a Kubernetes-native call center control platform that transforms FreeSWITCH into a programmable, cloud-native telephony system. It provides a modern REST API for managing call center queues, SIP extensions, IVR flows, and real-time call control operations.
Instead of managing monolithic FreeSWITCH instances with shared state, Ominis deploys one FreeSWITCH pod per queue, providing complete isolation, independent scaling, and fault tolerance.
Powered by Ominis.ai
Who is it for?
Ominis Cluster Manager is designed for:
- Platform Builders: Teams building call center platforms that need programmatic control
- DevOps Engineers: Operations teams managing telephony infrastructure at scale
- SaaS Providers: Multi-tenant contact center platforms requiring isolation and security
- Enterprise IT: Organizations modernizing legacy PBX systems with cloud-native architecture
What problems does it solve?
Traditional call center systems suffer from:
- Monolithic Architecture: Single FreeSWITCH instance becomes a single point of failure
- Shared State: Queue failures cascade across the system
- Manual Configuration: XML file editing and service restarts for changes
- Limited Isolation: No tenant separation or resource guarantees
- Operational Complexity: Difficult to scale, debug, and monitor individual queues
Ominis Cluster Manager solves these with:
- Container-Per-Queue Model: Complete isolation and independent lifecycle management
- REST API: Programmatic control over all telephony operations
- Kubernetes-Native: Cloud-native deployment with auto-scaling and self-healing
- Database-Driven Configuration: Dynamic changes without service restarts
- Observability: Prometheus metrics and structured logging for all operations
Key Value Propositions
✅ Cloud-Native Architecture: Built for Kubernetes from day one
✅ Complete API Coverage: 100+ REST endpoints for all operations
✅ Multi-Tenant Ready: Resource isolation and security boundaries
✅ Production-Grade: Battle-tested patterns with comprehensive testing
✅ Developer Experience: OpenAPI documentation, type safety, and modern tooling
High-Level Architecture
Ominis Cluster Manager follows a ports and adapters (hexagonal architecture) pattern, separating business logic from infrastructure concerns. The system consists of three main layers:
Architecture Layers
System Architecture
Data Flow Between Components
Configuration Flow:
- Client sends REST API request to create/update resource
- API validates request and stores configuration in PostgreSQL
- API orchestrates Kubernetes to create/update pod
- FreeSWITCH pod loads configuration via ODBC on startup
Call Control Flow:
- Client sends call control command via REST API
- API translates to FreeSWITCH XML-RPC command
- XML-RPC client executes command on target pod
- FreeSWITCH returns result via XML-RPC
- API returns result to client as JSON
Directory Lookup Flow:
- FreeSWITCH receives SIP REGISTER/INVITE
- mod_xml_curl sends HTTP POST to API directory endpoint
- API queries PostgreSQL for extension configuration
- API returns XML directory/dialplan response
- FreeSWITCH authenticates/routes call based on XML
Core Components
1. API Service (FastAPI)
Purpose: REST API gateway for all telephony operations
Key Features:
- 100+ REST Endpoints: Complete coverage of queue, extension, IVR, and call control operations
- OpenAPI Documentation: Interactive Swagger UI at
/docs - API Key Authentication:
X-API-Keyheader validation - Prometheus Metrics: Endpoint latency, request counts, error rates
- Async Processing: Non-blocking I/O with Python asyncio
- Structured Logging: JSON logs for centralized collection
Technology:
- Python 3.11
- FastAPI (async web framework)
- Pydantic (type validation)
- Uvicorn (ASGI server)
2. Queue Pods (FreeSWITCH + mod_callcenter)
Purpose: Isolated call center queue instances
One-Pod-Per-Queue Model:
- Each queue runs in dedicated FreeSWITCH container
- Complete resource isolation (CPU, memory, network)
- Independent scaling and lifecycle management
- Failure isolation (one queue down ≠ all queues down)
Configuration:
- Database: PostgreSQL ODBC for dynamic configuration
- Agents: SIP endpoints defined in
cc_agentstable - Tiers: Agent-to-queue assignments with priority/position
- Members: Callers waiting in queue (FIFO management)
Queue Strategies:
ring-all- Ring all available agentslongest-idle-agent- Route to agent idle longestround-robin- Distribute evenly across agentstop-down- Ring agents by tier orderagent-with-least-talk-time- Balance call timeagent-with-fewest-calls- Balance call count
3. IVR Pods (FreeSWITCH + ESL Socket Handler)
Purpose: Interactive Voice Response menu systems
Architecture:
- One Pod Per IVR: Isolated IVR instances
- Database-Driven Menus: Menu structure stored in PostgreSQL
- OpenAI TTS Integration: High-quality text-to-speech with caching
- ESL Socket Handler: Python script handles menu logic via Event Socket
- Supervisord: Process manager for FreeSWITCH + Socket Handler
Supported Actions:
- Transfer to queue/extension
- Sub-menu navigation
- HTTP API calls (webhook integration)
- Audio playback
- Voicemail routing
- Call hangup
4. Campaign Pod (Outbound Dialer)
Purpose: Dedicated FreeSWITCH instance for outbound campaigns
Features:
- XML-RPC Interface: Programmatic call origination
- Campaign Management: Contact list upload and processing
- Progress Tracking: Real-time campaign metrics
- Call Pacing: Configurable dialing rate
- Answer Detection: AMD (Answering Machine Detection) support
Campaign Types:
- Progressive Dialer: One call per available agent
- Predictive Dialer: Multiple calls per agent (abandoned call mitigation)
- Preview Dialer: Agent reviews contact before dialing
5. Registrar Pod (SIP Registration & B2BUA)
Purpose: SIP registration server and media anchor
Key Features:
Hybrid Authentication:
- Cluster IPs (10.x.x.x): Blind registration (ACL-based trust)
- External IPs: mod_xml_curl → API → PostgreSQL lookup
B2BUA Pattern:
- Queue originates call to
sofia/gateway/registrar/agent-XXXX - Registrar answers with internal IP
- Registrar originates to agent with public IP (51.79.31.20)
- Registrar bridges both legs
- Result: NAT/media anchoring between internal pods and external SIP clients
Public IP Advertisement:
ext-rtp-ip: 51.79.31.20for external connectivity- Media anchoring for all RTP streams
- ICE disabled on queue side to prevent candidate conflicts
6. PostgreSQL (Configuration Store)
Purpose: Centralized configuration and state storage
Key Tables:
queues- Queue definitions and settingscc_agents- Agent definitions and statuscc_tiers- Agent-to-queue assignmentscc_members- Active callers in queueextensions- SIP user extensionsivrs- IVR pod configurationsivr_menus- IVR menu definitionsivr_menu_options- Menu option actionsivr_tts_cache- Cached TTS audio files
Connection Method:
- API: SQLAlchemy async (asyncpg driver)
- FreeSWITCH Pods: ODBC (unixODBC + psqlODBC)
7. Kubernetes (Orchestration Layer)
Purpose: Container orchestration and service discovery
Resources:
- Deployments: Stateless pods (API, Registrar, Campaign)
- StatefulSets: Stateful pods (Queue, IVR) with stable network identity
- Services: Internal DNS (e.g.,
queue-sales.client-demo-client.svc.cluster.local) - ConfigMaps: Dynamic configuration injection
- Secrets: Credentials and API keys
- Ingress: External HTTPS access via Traefik
Namespace Model:
- One namespace per tenant (e.g.,
client-demo-client) - Resource quotas and network policies
- Complete isolation between tenants
Deployment Modes
Ominis Cluster Manager supports two deployment modes:
Kubernetes (Production)
Recommended for: Production deployments, multi-tenant platforms, auto-scaling
Advantages:
- Auto-Scaling: Horizontal pod autoscaling (HPA) based on CPU/memory
- Self-Healing: Automatic pod restart on failure
- Service Discovery: DNS-based routing (e.g.,
queue-sales.namespace.svc.cluster.local) - Load Balancing: Built-in service load balancing
- Rolling Updates: Zero-downtime deployments
- Resource Limits: CPU/memory quotas per pod
- Multi-Tenancy: Namespace isolation
Configuration:
DEPLOYMENT_MODE=kubernetes
KUBERNETES_NAMESPACE=client-demo-client
Deployment:
make helm-apply
Docker (Development)
Recommended for: Local development, testing, single-node deployments
Advantages:
- Simplicity: No Kubernetes cluster required
- Fast Iteration: Quick container rebuild and restart
- Local Testing: Test on laptop before cloud deployment
- Debugging: Direct container logs and shell access
Configuration:
DEPLOYMENT_MODE=docker
Deployment:
make build
docker-compose up
Comparison Matrix
| Feature | Kubernetes | Docker |
|---|---|---|
| Auto-Scaling | ✅ Yes (HPA) | ❌ No |
| Self-Healing | ✅ Yes (ReplicaSets) | ❌ No |
| Multi-Tenancy | ✅ Yes (Namespaces) | ⚠️ Manual |
| Service Discovery | ✅ DNS (e.g., queue-sales.ns.svc) | ⚠️ Manual |
| Load Balancing | ✅ Built-in | ❌ Manual |
| Rolling Updates | ✅ Yes | ❌ Manual |
| Setup Complexity | ⚠️ Medium (K8s cluster) | ✅ Low (Docker only) |
| Local Development | ⚠️ Requires Minikube/Kind | ✅ Native |
ADR-0001: Why Container-Per-Queue?
Context
Traditional call center systems deploy a single FreeSWITCH instance with multiple queues sharing the same process. This creates shared fate: if one queue has a configuration error, consumes excessive resources, or crashes, all queues are affected.
Problem: How do we achieve fault isolation, independent scaling, and multi-tenancy in a call center platform?
Decision
Deploy one FreeSWITCH pod per queue using Kubernetes StatefulSets.
Each queue gets:
- Dedicated FreeSWITCH process
- Isolated resources (CPU, memory, network)
- Independent lifecycle (create, update, delete)
- Stable DNS name (e.g.,
queue-sales.client-demo-client.svc.cluster.local)
Alternatives Considered
1. Shared FreeSWITCH Instance
- ❌ Single point of failure
- ❌ No resource isolation
- ❌ Configuration changes affect all queues
- ❌ Difficult to debug individual queue issues
2. One Pod Per Tenant (Multiple Queues)
- ⚠️ Better than shared, but still coupled
- ❌ Queue failures cascade within tenant
- ⚠️ Scaling granularity at tenant level, not queue level
3. Container-Per-Queue (Chosen)
- ✅ Complete fault isolation
- ✅ Independent scaling (scale hot queues, not cold ones)
- ✅ Clear resource attribution
- ✅ Simplified debugging (one queue = one pod)
- ⚠️ Resource overhead (more containers)
Consequences
Positive:
- ✅ Fault Isolation: Queue-sales crash doesn't affect queue-support
- ✅ Independent Scaling: Scale each queue based on its load
- ✅ Clear Debugging: Pod logs map 1:1 to queue issues
- ✅ Multi-Tenancy: Strong isolation boundaries
- ✅ Resource Limits: Set CPU/memory per queue
- ✅ Security: Network policies per queue
- ✅ Simplified Rollbacks: Roll back one queue, not entire system
Negative:
- ⚠️ Resource Overhead: Each pod requires base FreeSWITCH memory (~50-100MB)
- ⚠️ Operational Complexity: More pods to monitor
- ⚠️ Startup Time: Creating queue = starting new container (~5-10s)
Mitigation:
- Use Kubernetes resource limits to cap memory usage
- Leverage Prometheus for centralized monitoring
- Pre-warm container images to reduce startup time
Status: ✅ Accepted - In production use
Date: 2024-01-15
ADR-0002: Why XML-RPC Over ESL?
Context
FreeSWITCH offers two primary programmatic interfaces:
- Event Socket Library (ESL): TCP socket connection for real-time events and commands
- XML-RPC: HTTP-based RPC interface for synchronous command execution
Problem: Which interface should Ominis Cluster Manager use for call control operations?
Decision
Use XML-RPC for write operations (commands), and PostgreSQL for read operations (state).
Rationale:
- Simpler Request/Response Model: XML-RPC is synchronous HTTP - no connection pooling, no event parsing
- Better Performance for Reads: Database queries are 10-50x faster than ESL
showcommands - Separation of Concerns: Commands via XML-RPC, state queries via database
- Stateless: No persistent connections to manage or recover
- HTTP Ecosystem: Leverage existing HTTP tools (retries, timeouts, observability)
Alternatives Considered
1. ESL (Event Socket Library)
- ✅ Real-time events (call start, end, DTMF)
- ✅ Full Ominis Cluster Manager access
- ❌ Persistent TCP connection (connection pooling complexity)
- ❌ Event parsing overhead
- ❌ Connection recovery logic
- ❌ Slower for state queries (
show channels)
2. XML-RPC (Chosen)
- ✅ Simple HTTP request/response
- ✅ Stateless (no connection management)
- ✅ Easy retries and timeouts
- ✅ Works with existing HTTP libraries
- ❌ No real-time events (must poll or use database triggers)
- ⚠️ Limited to synchronous commands
3. Hybrid: ESL for Events, XML-RPC for Commands
- ✅ Best of both worlds
- ❌ Operational complexity (two connection types)
- ❌ Overkill for current use case
4. Database-Only
- ✅ Fast reads
- ❌ Cannot issue commands (originate, hangup, transfer)
- ❌ Not a viable option
Consequences
Positive:
- ✅ Simplicity: HTTP request/response model (no connection pooling)
- ✅ Performance: Database reads are 10-50x faster than ESL
showcommands - ✅ Reliability: Stateless (no connection recovery logic)
- ✅ Observability: HTTP metrics (latency, errors, retries)
- ✅ Developer Experience: Easier to debug and test
Negative:
- ⚠️ No Real-Time Events: Cannot receive call events in real-time
- Mitigation: Use database polling or PostgreSQL LISTEN/NOTIFY for state changes
- ⚠️ Less Feature-Rich: Some ESL-only commands unavailable
- Mitigation: XML-RPC covers 95% of use cases; use ESL directly for advanced scenarios
Trade-Offs:
- Read Operations: Database > XML-RPC > ESL (in terms of performance)
- Write Operations: XML-RPC ≈ ESL (both synchronous)
- Events: ESL only (but not needed for current use case)
Status: ✅ Accepted - In production use
Date: 2024-01-15
Technology Stack
API Layer
| Component | Technology | Purpose |
|---|---|---|
| Web Framework | FastAPI | Async REST API framework |
| Validation | Pydantic | Type safety and validation |
| ASGI Server | Uvicorn | Production web server |
| Language | Python 3.11 | Modern async Python |
| API Documentation | OpenAPI 3.0 | Auto-generated docs |
Database Layer
| Component | Technology | Purpose |
|---|---|---|
| Primary DB | PostgreSQL 15 | Configuration and state |
| ORM | SQLAlchemy (async) | Database abstraction |
| Driver | asyncpg | High-performance async driver |
| ODBC | unixODBC + psqlODBC | FreeSWITCH ODBC integration |
Telephony Layer
| Component | Technology | Purpose |
|---|---|---|
| PBX Engine | FreeSWITCH | Core telephony processing |
| Call Center | mod_callcenter | Queue and agent management |
| XML-RPC | mod_xml_rpc | Programmatic control |
| Directory | mod_xml_curl | Dynamic user provisioning |
| ESL | Event Socket Library | IVR socket handler |
Orchestration Layer
| Component | Technology | Purpose |
|---|---|---|
| Container Runtime | Docker | Container packaging |
| Orchestration | Kubernetes | Production orchestration |
| Package Management | Helm | Kubernetes deployments |
| Ingress | Traefik | HTTPS ingress controller |
| Registry | GitHub Container Registry | Container images |
Observability
| Component | Technology | Purpose |
|---|---|---|
| Metrics | Prometheus | Time-series metrics |
| Logging | Structured JSON logs | Centralized logging |
| Tracing | (Future) OpenTelemetry | Distributed tracing |
Request Flow Example
This sequence diagram shows the complete request flow for creating a new queue:
Step-by-Step Breakdown
1. Client Request
- Client sends
POST /v1/queueswith queue configuration - Request includes
X-API-Keyheader for authentication
2. Authentication
- API key middleware validates header against
API_KEYenvironment variable - Returns
401 Unauthorizedif invalid
3. Database Storage
- API writes queue configuration to PostgreSQL
queuestable - Configuration includes: name, strategy, tier rules, timeouts, etc.
4. Kubernetes Orchestration
- Orchestrator creates Kubernetes resources:
- ConfigMap: FreeSWITCH environment variables
- Service: Internal DNS (e.g.,
queue-sales.namespace.svc.cluster.local) - StatefulSet: Deploys FreeSWITCH pod with stable identity
5. Pod Initialization
- FreeSWITCH container starts
- Loads environment variables from ConfigMap
- Connects to PostgreSQL via ODBC
- Loads mod_callcenter configuration from database
- Reports ready to Kubernetes
6. Response
- API returns
201 Createdwith queue details - Client can now add agents, tiers, and route calls to queue
Typical Response Times
- Database Insert: 5-10ms
- Kubernetes Resource Creation: 100-500ms
- Pod Startup: 3-5 seconds
- Total Request: 4-6 seconds
Examples
Example 1: Health Check
Check if Ominis API is running:
curl -X GET http://localhost:8000/health
Response:
{
"status": "healthy",
"service": "callcenter-api"
}
Use Cases:
- Load balancer health checks
- Monitoring system probes
- Container readiness checks
Example 2: Get Branding Info
Retrieve branding information:
curl -X GET http://localhost:8000/v1/branding \
-H "X-API-Key: demo"
Response:
{
"brand": "Ominis AI",
"poweredBy": "Ominis.ai"
}
HTTP Headers:
X-Powered-By: Ominis.ai
Example 3: Create Queue
Create a new sales queue with longest-idle-agent strategy:
curl -X POST http://localhost:8000/v1/queues \
-H "X-API-Key: demo" \
-H "Content-Type: application/json" \
-d '{
"name": "sales",
"strategy": "longest-idle-agent",
"max_wait_time": 300,
"max_wait_time_no_agent": 120,
"tier_rule_wait": true,
"tier_rule_wait_multiply": true,
"tier_rule_no_agent_no_wait": false,
"announce_position": true,
"announce_holdtime": true
}'
Response:
{
"name": "sales",
"strategy": "longest-idle-agent",
"max_wait_time": 300,
"max_wait_time_no_agent": 120,
"status": "ready",
"created_at": "2024-01-15T10:30:00Z"
}
What Happens:
- Queue configuration saved to PostgreSQL
- Kubernetes StatefulSet created
- FreeSWITCH pod starts with queue configuration
- Service DNS available at
queue-sales.client-demo-client.svc.cluster.local
Example 4: Access Interactive API Documentation
Open your browser to view interactive Swagger UI:
http://localhost:8000/docs
Features:
- Try all 100+ endpoints directly in browser
- View request/response schemas
- Generate curl commands
- Authentication with API key
- Download OpenAPI spec
Alternative Documentation:
- ReDoc:
http://localhost:8000/redoc - OpenAPI JSON:
http://localhost:8000/openapi.json
Context & Rationale
Why Kubernetes-Native?
Decision: Build on Kubernetes instead of traditional VM-based deployment.
Rationale:
- Dynamic Pod Management: Create/delete queues as Kubernetes resources
- Auto-Scaling: Horizontal Pod Autoscaler (HPA) scales based on CPU/memory
- Service Discovery: DNS-based routing (no hardcoded IPs)
- Resource Isolation: CPU/memory limits per pod
- Self-Healing: Automatic pod restart on failure
- Cloud-Native: Runs on any Kubernetes cluster (GKE, EKS, AKS, on-prem)
- Declarative Configuration: Infrastructure as code (Helm charts)
Trade-Offs:
- ⚠️ Requires Kubernetes cluster (operational overhead)
- ⚠️ Learning curve for Kubernetes concepts
- ✅ But: Production-grade reliability and scalability
Why FastAPI?
Decision: Use FastAPI instead of Flask, Django, or Node.js.
Rationale:
- Modern Async Python: Native
async/awaitsupport (non-blocking I/O) - Automatic OpenAPI Generation:
/docsendpoint out of the box - Type Safety: Pydantic models for request/response validation
- High Performance: Comparable to Node.js and Go
- Developer Experience: Fast iteration, excellent error messages
- Ecosystem: Rich ecosystem of async libraries (asyncpg, httpx, etc.)
Comparison:
| Framework | Async | OpenAPI | Type Safety | Performance |
|---|---|---|---|---|
| FastAPI | ✅ | ✅ Auto | ✅ Pydantic | ⚡ Fast |
| Flask | ⚠️ Manual | ❌ | ❌ | 🐢 Slower |
| Django | ⚠️ Limited | ⚠️ DRF | ⚠️ DRF | 🐢 Slower |
| Node.js | ✅ | ⚠️ Manual | ⚠️ TypeScript | ⚡ Fast |
Why PostgreSQL?
Decision: Use PostgreSQL instead of MySQL, MongoDB, or SQLite.
Rationale:
- ACID Compliance: Strong consistency guarantees for call center state
- JSON Support: Flexible schemas for IVR menus and metadata
- ODBC Integration: Native ODBC support for FreeSWITCH mod_callcenter
- Battle-Tested: 30+ years of production use
- Rich Queries: Complex joins, CTEs, window functions
- Extensions: PostGIS, pg_trgm, pg_stat_statements
- Replication: Streaming replication for high availability
FreeSWITCH Integration:
- mod_callcenter reads/writes directly to PostgreSQL via ODBC
- No synchronization lag (database is source of truth)
- No dual-write consistency issues
Why One-Pod-Per-Queue?
See ADR-0001 above.
Summary:
- ✅ Fault isolation (one queue failure doesn't cascade)
- ✅ Independent scaling (scale hot queues separately)
- ✅ Clear debugging (one queue = one pod)
- ⚠️ Resource overhead (more containers)
Production Validation:
- Running 50+ queues in production
- Average pod memory: 75MB
- Zero cross-queue failures in 6 months
Links to Related Sections
- Helm Infrastructure - Kubernetes deployment with Helm charts
- Queue Management - Deep dive on queue API and mod_callcenter
- Extension Management - SIP extension CRUD and authentication
- Ports & Adapters - Hexagonal architecture pattern
- Database Schema - Complete database documentation
- Testing Strategy - Test organization and best practices
- Getting Started - Quick start guide for new users
Next Steps
Now that you understand the system architecture, explore:
- Getting Started Guide - Deploy your first queue
- Queue Management API - Learn queue operations
- Call Control API - Control active calls
- IVR System - Build interactive voice menus
- Campaign Management - Outbound dialer campaigns
Powered by Ominis.ai - Modern call center infrastructure for cloud-native platforms.