System Overview & Architecture

Introduction

What is Ominis Cluster Manager?

Ominis Cluster Manager is a Kubernetes-native call center control platform that transforms FreeSWITCH into a programmable, cloud-native telephony system. It provides a modern REST API for managing call center queues, SIP extensions, IVR flows, and real-time call control operations.

Instead of managing monolithic FreeSWITCH instances with shared state, Ominis deploys one FreeSWITCH pod per queue, providing complete isolation, independent scaling, and fault tolerance.

Powered by Ominis.ai

Who is it for?

Ominis Cluster Manager is designed for:

Platform Builders: Teams building call center platforms that need programmatic control
DevOps Engineers: Operations teams managing telephony infrastructure at scale
SaaS Providers: Multi-tenant contact center platforms requiring isolation and security
Enterprise IT: Organizations modernizing legacy PBX systems with cloud-native architecture

What problems does it solve?

Traditional call center systems suffer from:

Monolithic Architecture: Single FreeSWITCH instance becomes a single point of failure
Shared State: Queue failures cascade across the system
Manual Configuration: XML file editing and service restarts for changes
Limited Isolation: No tenant separation or resource guarantees
Operational Complexity: Difficult to scale, debug, and monitor individual queues

Ominis Cluster Manager solves these with:

Container-Per-Queue Model: Complete isolation and independent lifecycle management
REST API: Programmatic control over all telephony operations
Kubernetes-Native: Cloud-native deployment with auto-scaling and self-healing
Database-Driven Configuration: Dynamic changes without service restarts
Observability: Prometheus metrics and structured logging for all operations

Key Value Propositions

✅ Cloud-Native Architecture: Built for Kubernetes from day one
✅ Complete API Coverage: 100+ REST endpoints for all operations
✅ Multi-Tenant Ready: Resource isolation and security boundaries
✅ Production-Grade: Battle-tested patterns with comprehensive testing
✅ Developer Experience: OpenAPI documentation, type safety, and modern tooling

High-Level Architecture

Ominis Cluster Manager follows a ports and adapters (hexagonal architecture) pattern, separating business logic from infrastructure concerns. The system consists of three main layers:

Architecture Layers

System Architecture

Data Flow Between Components

Configuration Flow:

Client sends REST API request to create/update resource
API validates request and stores configuration in PostgreSQL
API orchestrates Kubernetes to create/update pod
FreeSWITCH pod loads configuration via ODBC on startup

Call Control Flow:

Client sends call control command via REST API
API translates to FreeSWITCH XML-RPC command
XML-RPC client executes command on target pod
FreeSWITCH returns result via XML-RPC
API returns result to client as JSON

Directory Lookup Flow:

FreeSWITCH receives SIP REGISTER/INVITE
mod_xml_curl sends HTTP POST to API directory endpoint
API queries PostgreSQL for extension configuration
API returns XML directory/dialplan response
FreeSWITCH authenticates/routes call based on XML

Core Components

1. API Service (FastAPI)

Purpose: REST API gateway for all telephony operations

Key Features:

100+ REST Endpoints: Complete coverage of queue, extension, IVR, and call control operations
OpenAPI Documentation: Interactive Swagger UI at /docs
API Key Authentication: X-API-Key header validation
Prometheus Metrics: Endpoint latency, request counts, error rates
Async Processing: Non-blocking I/O with Python asyncio
Structured Logging: JSON logs for centralized collection

Technology:

Python 3.11
FastAPI (async web framework)
Pydantic (type validation)
Uvicorn (ASGI server)

2. Queue Pods (FreeSWITCH + mod_callcenter)

Purpose: Isolated call center queue instances

One-Pod-Per-Queue Model:

Each queue runs in dedicated FreeSWITCH container
Complete resource isolation (CPU, memory, network)
Independent scaling and lifecycle management
Failure isolation (one queue down ≠ all queues down)

Configuration:

Database: PostgreSQL ODBC for dynamic configuration
Agents: SIP endpoints defined in cc_agents table
Tiers: Agent-to-queue assignments with priority/position
Members: Callers waiting in queue (FIFO management)

Queue Strategies:

ring-all - Ring all available agents
longest-idle-agent - Route to agent idle longest
round-robin - Distribute evenly across agents
top-down - Ring agents by tier order
agent-with-least-talk-time - Balance call time
agent-with-fewest-calls - Balance call count

3. IVR Pods (FreeSWITCH + ESL Socket Handler)

Purpose: Interactive Voice Response menu systems

Architecture:

One Pod Per IVR: Isolated IVR instances
Database-Driven Menus: Menu structure stored in PostgreSQL
OpenAI TTS Integration: High-quality text-to-speech with caching
ESL Socket Handler: Python script handles menu logic via Event Socket
Supervisord: Process manager for FreeSWITCH + Socket Handler

Supported Actions:

Transfer to queue/extension
Sub-menu navigation
HTTP API calls (webhook integration)
Audio playback
Voicemail routing
Call hangup

4. Campaign Pod (Outbound Dialer)

Purpose: Dedicated FreeSWITCH instance for outbound campaigns

Features:

XML-RPC Interface: Programmatic call origination
Campaign Management: Contact list upload and processing
Progress Tracking: Real-time campaign metrics
Call Pacing: Configurable dialing rate
Answer Detection: AMD (Answering Machine Detection) support

Campaign Types:

Progressive Dialer: One call per available agent
Predictive Dialer: Multiple calls per agent (abandoned call mitigation)
Preview Dialer: Agent reviews contact before dialing

5. Registrar Pod (SIP Registration & B2BUA)

Purpose: SIP registration server and media anchor

Key Features:

Hybrid Authentication:

Cluster IPs (10.x.x.x): Blind registration (ACL-based trust)
External IPs: mod_xml_curl → API → PostgreSQL lookup

B2BUA Pattern:

Queue originates call to sofia/gateway/registrar/agent-XXXX
Registrar answers with internal IP
Registrar originates to agent with public IP (51.79.31.20)
Registrar bridges both legs
Result: NAT/media anchoring between internal pods and external SIP clients

Public IP Advertisement:

ext-rtp-ip: 51.79.31.20 for external connectivity
Media anchoring for all RTP streams
ICE disabled on queue side to prevent candidate conflicts

6. PostgreSQL (Configuration Store)

Purpose: Centralized configuration and state storage

Key Tables:

queues - Queue definitions and settings
cc_agents - Agent definitions and status
cc_tiers - Agent-to-queue assignments
cc_members - Active callers in queue
extensions - SIP user extensions
ivrs - IVR pod configurations
ivr_menus - IVR menu definitions
ivr_menu_options - Menu option actions
ivr_tts_cache - Cached TTS audio files

Connection Method:

API: SQLAlchemy async (asyncpg driver)
FreeSWITCH Pods: ODBC (unixODBC + psqlODBC)

7. Kubernetes (Orchestration Layer)

Purpose: Container orchestration and service discovery

Resources:

Deployments: Stateless pods (API, Registrar, Campaign)
StatefulSets: Stateful pods (Queue, IVR) with stable network identity
Services: Internal DNS (e.g., queue-sales.client-demo-client.svc.cluster.local)
ConfigMaps: Dynamic configuration injection
Secrets: Credentials and API keys
Ingress: External HTTPS access via Traefik

Namespace Model:

One namespace per tenant (e.g., client-demo-client)
Resource quotas and network policies
Complete isolation between tenants

Deployment Modes

Ominis Cluster Manager supports two deployment modes:

Kubernetes (Production)

Recommended for: Production deployments, multi-tenant platforms, auto-scaling

Advantages:

Auto-Scaling: Horizontal pod autoscaling (HPA) based on CPU/memory
Self-Healing: Automatic pod restart on failure
Service Discovery: DNS-based routing (e.g., queue-sales.namespace.svc.cluster.local)
Load Balancing: Built-in service load balancing
Rolling Updates: Zero-downtime deployments
Resource Limits: CPU/memory quotas per pod
Multi-Tenancy: Namespace isolation

Configuration:

DEPLOYMENT_MODE=kubernetes
KUBERNETES_NAMESPACE=client-demo-client

Deployment:

make helm-apply

Docker (Development)

Recommended for: Local development, testing, single-node deployments

Advantages:

Simplicity: No Kubernetes cluster required
Fast Iteration: Quick container rebuild and restart
Local Testing: Test on laptop before cloud deployment
Debugging: Direct container logs and shell access

Configuration:

DEPLOYMENT_MODE=docker

Deployment:

make build
docker-compose up

Comparison Matrix

Feature	Kubernetes	Docker
Auto-Scaling	✅ Yes (HPA)	❌ No
Self-Healing	✅ Yes (ReplicaSets)	❌ No
Multi-Tenancy	✅ Yes (Namespaces)	⚠️ Manual
Service Discovery	✅ DNS (e.g., `queue-sales.ns.svc`)	⚠️ Manual
Load Balancing	✅ Built-in	❌ Manual
Rolling Updates	✅ Yes	❌ Manual
Setup Complexity	⚠️ Medium (K8s cluster)	✅ Low (Docker only)
Local Development	⚠️ Requires Minikube/Kind	✅ Native

ADR-0001: Why Container-Per-Queue?

Context

Traditional call center systems deploy a single FreeSWITCH instance with multiple queues sharing the same process. This creates shared fate: if one queue has a configuration error, consumes excessive resources, or crashes, all queues are affected.

Problem: How do we achieve fault isolation, independent scaling, and multi-tenancy in a call center platform?

Decision

Deploy one FreeSWITCH pod per queue using Kubernetes StatefulSets.

Each queue gets:

Dedicated FreeSWITCH process
Isolated resources (CPU, memory, network)
Independent lifecycle (create, update, delete)
Stable DNS name (e.g., queue-sales.client-demo-client.svc.cluster.local)

Alternatives Considered

1. Shared FreeSWITCH Instance

❌ Single point of failure
❌ No resource isolation
❌ Configuration changes affect all queues
❌ Difficult to debug individual queue issues

2. One Pod Per Tenant (Multiple Queues)

⚠️ Better than shared, but still coupled
❌ Queue failures cascade within tenant
⚠️ Scaling granularity at tenant level, not queue level

3. Container-Per-Queue (Chosen)

✅ Complete fault isolation
✅ Independent scaling (scale hot queues, not cold ones)
✅ Clear resource attribution
✅ Simplified debugging (one queue = one pod)
⚠️ Resource overhead (more containers)

Consequences

Positive:

✅ Fault Isolation: Queue-sales crash doesn't affect queue-support
✅ Independent Scaling: Scale each queue based on its load
✅ Clear Debugging: Pod logs map 1:1 to queue issues
✅ Multi-Tenancy: Strong isolation boundaries
✅ Resource Limits: Set CPU/memory per queue
✅ Security: Network policies per queue
✅ Simplified Rollbacks: Roll back one queue, not entire system

Negative:

⚠️ Resource Overhead: Each pod requires base FreeSWITCH memory (~50-100MB)
⚠️ Operational Complexity: More pods to monitor
⚠️ Startup Time: Creating queue = starting new container (~5-10s)

Mitigation:

Use Kubernetes resource limits to cap memory usage
Leverage Prometheus for centralized monitoring
Pre-warm container images to reduce startup time

Status: ✅ Accepted - In production use

Date: 2024-01-15

ADR-0002: Why XML-RPC Over ESL?

Context

FreeSWITCH offers two primary programmatic interfaces:

Event Socket Library (ESL): TCP socket connection for real-time events and commands
XML-RPC: HTTP-based RPC interface for synchronous command execution

Problem: Which interface should Ominis Cluster Manager use for call control operations?

Decision

Use XML-RPC for write operations (commands), and PostgreSQL for read operations (state).

Rationale:

Simpler Request/Response Model: XML-RPC is synchronous HTTP - no connection pooling, no event parsing
Better Performance for Reads: Database queries are 10-50x faster than ESL show commands
Separation of Concerns: Commands via XML-RPC, state queries via database
Stateless: No persistent connections to manage or recover
HTTP Ecosystem: Leverage existing HTTP tools (retries, timeouts, observability)

Alternatives Considered

1. ESL (Event Socket Library)

✅ Real-time events (call start, end, DTMF)
✅ Full Ominis Cluster Manager access
❌ Persistent TCP connection (connection pooling complexity)
❌ Event parsing overhead
❌ Connection recovery logic
❌ Slower for state queries (show channels)

2. XML-RPC (Chosen)

✅ Simple HTTP request/response
✅ Stateless (no connection management)
✅ Easy retries and timeouts
✅ Works with existing HTTP libraries
❌ No real-time events (must poll or use database triggers)
⚠️ Limited to synchronous commands

3. Hybrid: ESL for Events, XML-RPC for Commands

✅ Best of both worlds
❌ Operational complexity (two connection types)
❌ Overkill for current use case

4. Database-Only

✅ Fast reads
❌ Cannot issue commands (originate, hangup, transfer)
❌ Not a viable option

Consequences

Positive:

✅ Simplicity: HTTP request/response model (no connection pooling)
✅ Performance: Database reads are 10-50x faster than ESL show commands
✅ Reliability: Stateless (no connection recovery logic)
✅ Observability: HTTP metrics (latency, errors, retries)
✅ Developer Experience: Easier to debug and test

Negative:

⚠️ No Real-Time Events: Cannot receive call events in real-time
- Mitigation: Use database polling or PostgreSQL LISTEN/NOTIFY for state changes
⚠️ Less Feature-Rich: Some ESL-only commands unavailable
- Mitigation: XML-RPC covers 95% of use cases; use ESL directly for advanced scenarios

Trade-Offs:

Read Operations: Database > XML-RPC > ESL (in terms of performance)
Write Operations: XML-RPC ≈ ESL (both synchronous)
Events: ESL only (but not needed for current use case)

Status: ✅ Accepted - In production use

Date: 2024-01-15

Technology Stack

API Layer

Component	Technology	Purpose
Web Framework	FastAPI	Async REST API framework
Validation	Pydantic	Type safety and validation
ASGI Server	Uvicorn	Production web server
Language	Python 3.11	Modern async Python
API Documentation	OpenAPI 3.0	Auto-generated docs

Database Layer

Component	Technology	Purpose
Primary DB	PostgreSQL 15	Configuration and state
ORM	SQLAlchemy (async)	Database abstraction
Driver	asyncpg	High-performance async driver
ODBC	unixODBC + psqlODBC	FreeSWITCH ODBC integration

Telephony Layer

Component	Technology	Purpose
PBX Engine	FreeSWITCH	Core telephony processing
Call Center	mod_callcenter	Queue and agent management
XML-RPC	mod_xml_rpc	Programmatic control
Directory	mod_xml_curl	Dynamic user provisioning
ESL	Event Socket Library	IVR socket handler

Orchestration Layer

Component	Technology	Purpose
Container Runtime	Docker	Container packaging
Orchestration	Kubernetes	Production orchestration
Package Management	Helm	Kubernetes deployments
Ingress	Traefik	HTTPS ingress controller
Registry	GitHub Container Registry	Container images

Observability

Component	Technology	Purpose
Metrics	Prometheus	Time-series metrics
Logging	Structured JSON logs	Centralized logging
Tracing	(Future) OpenTelemetry	Distributed tracing

Request Flow Example

This sequence diagram shows the complete request flow for creating a new queue:

Step-by-Step Breakdown

1. Client Request

Client sends POST /v1/queues with queue configuration
Request includes X-API-Key header for authentication

2. Authentication

API key middleware validates header against API_KEY environment variable
Returns 401 Unauthorized if invalid

3. Database Storage

API writes queue configuration to PostgreSQL queues table
Configuration includes: name, strategy, tier rules, timeouts, etc.

4. Kubernetes Orchestration

Orchestrator creates Kubernetes resources:
- ConfigMap: FreeSWITCH environment variables
- Service: Internal DNS (e.g., queue-sales.namespace.svc.cluster.local)
- StatefulSet: Deploys FreeSWITCH pod with stable identity

5. Pod Initialization

FreeSWITCH container starts
Loads environment variables from ConfigMap
Connects to PostgreSQL via ODBC
Loads mod_callcenter configuration from database
Reports ready to Kubernetes

6. Response

API returns 201 Created with queue details
Client can now add agents, tiers, and route calls to queue

Typical Response Times

Database Insert: 5-10ms
Kubernetes Resource Creation: 100-500ms
Pod Startup: 3-5 seconds
Total Request: 4-6 seconds

Examples

Example 1: Health Check

Check if Ominis API is running:

curl -X GET http://localhost:8000/health

Response:

{
  "status": "healthy",
  "service": "callcenter-api"
}

Use Cases:

Load balancer health checks
Monitoring system probes
Container readiness checks

Example 2: Get Branding Info

Retrieve branding information:

curl -X GET http://localhost:8000/v1/branding \
  -H "X-API-Key: demo"

Response:

{
  "brand": "Ominis AI",
  "poweredBy": "Ominis.ai"
}

HTTP Headers:

X-Powered-By: Ominis.ai

Example 3: Create Queue

Create a new sales queue with longest-idle-agent strategy:

curl -X POST http://localhost:8000/v1/queues \
  -H "X-API-Key: demo" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "sales",
    "strategy": "longest-idle-agent",
    "max_wait_time": 300,
    "max_wait_time_no_agent": 120,
    "tier_rule_wait": true,
    "tier_rule_wait_multiply": true,
    "tier_rule_no_agent_no_wait": false,
    "announce_position": true,
    "announce_holdtime": true
  }'

Response:

{
  "name": "sales",
  "strategy": "longest-idle-agent",
  "max_wait_time": 300,
  "max_wait_time_no_agent": 120,
  "status": "ready",
  "created_at": "2024-01-15T10:30:00Z"
}

What Happens:

Queue configuration saved to PostgreSQL
Kubernetes StatefulSet created
FreeSWITCH pod starts with queue configuration
Service DNS available at queue-sales.client-demo-client.svc.cluster.local

Example 4: Access Interactive API Documentation

Open your browser to view interactive Swagger UI:

http://localhost:8000/docs

Features:

Try all 100+ endpoints directly in browser
View request/response schemas
Generate curl commands
Authentication with API key
Download OpenAPI spec

Alternative Documentation:

ReDoc: http://localhost:8000/redoc
OpenAPI JSON: http://localhost:8000/openapi.json

Context & Rationale

Why Kubernetes-Native?

Decision: Build on Kubernetes instead of traditional VM-based deployment.

Rationale:

Dynamic Pod Management: Create/delete queues as Kubernetes resources
Auto-Scaling: Horizontal Pod Autoscaler (HPA) scales based on CPU/memory
Service Discovery: DNS-based routing (no hardcoded IPs)
Resource Isolation: CPU/memory limits per pod
Self-Healing: Automatic pod restart on failure
Cloud-Native: Runs on any Kubernetes cluster (GKE, EKS, AKS, on-prem)
Declarative Configuration: Infrastructure as code (Helm charts)

Trade-Offs:

⚠️ Requires Kubernetes cluster (operational overhead)
⚠️ Learning curve for Kubernetes concepts
✅ But: Production-grade reliability and scalability

Why FastAPI?

Decision: Use FastAPI instead of Flask, Django, or Node.js.

Rationale:

Modern Async Python: Native async/await support (non-blocking I/O)
Automatic OpenAPI Generation: /docs endpoint out of the box
Type Safety: Pydantic models for request/response validation
High Performance: Comparable to Node.js and Go
Developer Experience: Fast iteration, excellent error messages
Ecosystem: Rich ecosystem of async libraries (asyncpg, httpx, etc.)

Comparison:

Framework	Async	OpenAPI	Type Safety	Performance
FastAPI	✅	✅ Auto	✅ Pydantic	⚡ Fast
Flask	⚠️ Manual	❌	❌	🐢 Slower
Django	⚠️ Limited	⚠️ DRF	⚠️ DRF	🐢 Slower
Node.js	✅	⚠️ Manual	⚠️ TypeScript	⚡ Fast

Why PostgreSQL?

Decision: Use PostgreSQL instead of MySQL, MongoDB, or SQLite.

Rationale:

ACID Compliance: Strong consistency guarantees for call center state
JSON Support: Flexible schemas for IVR menus and metadata
ODBC Integration: Native ODBC support for FreeSWITCH mod_callcenter
Battle-Tested: 30+ years of production use
Rich Queries: Complex joins, CTEs, window functions
Extensions: PostGIS, pg_trgm, pg_stat_statements
Replication: Streaming replication for high availability

FreeSWITCH Integration:

mod_callcenter reads/writes directly to PostgreSQL via ODBC
No synchronization lag (database is source of truth)
No dual-write consistency issues

Why One-Pod-Per-Queue?

See ADR-0001 above.

Summary:

✅ Fault isolation (one queue failure doesn't cascade)
✅ Independent scaling (scale hot queues separately)
✅ Clear debugging (one queue = one pod)
⚠️ Resource overhead (more containers)

Production Validation:

Running 50+ queues in production
Average pod memory: 75MB
Zero cross-queue failures in 6 months

Helm Infrastructure - Kubernetes deployment with Helm charts
Queue Management - Deep dive on queue API and mod_callcenter
Extension Management - SIP extension CRUD and authentication
Ports & Adapters - Hexagonal architecture pattern
Database Schema - Complete database documentation
Testing Strategy - Test organization and best practices
Getting Started - Quick start guide for new users

Next Steps

Now that you understand the system architecture, explore:

Getting Started Guide - Deploy your first queue
Queue Management API - Learn queue operations
Call Control API - Control active calls
IVR System - Build interactive voice menus
Campaign Management - Outbound dialer campaigns

Powered by Ominis.ai - Modern call center infrastructure for cloud-native platforms.

Introduction​

What is Ominis Cluster Manager?​

Who is it for?​

What problems does it solve?​

Key Value Propositions​

High-Level Architecture​

Architecture Layers​

System Architecture​

Data Flow Between Components​

Core Components​

1. API Service (FastAPI)​

2. Queue Pods (FreeSWITCH + mod_callcenter)​

3. IVR Pods (FreeSWITCH + ESL Socket Handler)​

4. Campaign Pod (Outbound Dialer)​

5. Registrar Pod (SIP Registration & B2BUA)​

6. PostgreSQL (Configuration Store)​

7. Kubernetes (Orchestration Layer)​

Deployment Modes​

Kubernetes (Production)​

Docker (Development)​

Comparison Matrix​

ADR-0001: Why Container-Per-Queue?​

Context​

Decision​

Alternatives Considered​

Consequences​

ADR-0002: Why XML-RPC Over ESL?​

Context​

Decision​

Alternatives Considered​

Consequences​

Technology Stack​

API Layer​

Database Layer​

Telephony Layer​

Orchestration Layer​

Observability​

Request Flow Example​

Step-by-Step Breakdown​

Typical Response Times​

Examples​

Example 1: Health Check​

Example 2: Get Branding Info​

Example 3: Create Queue​

Example 4: Access Interactive API Documentation​

Context & Rationale​

Why Kubernetes-Native?​

Why FastAPI?​

Why PostgreSQL?​

Why One-Pod-Per-Queue?​

Links to Related Sections​

Next Steps​

Introduction

What is Ominis Cluster Manager?

Who is it for?

What problems does it solve?

Key Value Propositions

High-Level Architecture

Architecture Layers

System Architecture

Data Flow Between Components

Core Components

1. API Service (FastAPI)

2. Queue Pods (FreeSWITCH + mod_callcenter)

3. IVR Pods (FreeSWITCH + ESL Socket Handler)

4. Campaign Pod (Outbound Dialer)

5. Registrar Pod (SIP Registration & B2BUA)

6. PostgreSQL (Configuration Store)

7. Kubernetes (Orchestration Layer)

Deployment Modes

Kubernetes (Production)

Docker (Development)

Comparison Matrix

ADR-0001: Why Container-Per-Queue?

Context

Decision

Alternatives Considered

Consequences

ADR-0002: Why XML-RPC Over ESL?

Context

Decision

Alternatives Considered

Consequences

Technology Stack

API Layer

Database Layer

Telephony Layer

Orchestration Layer

Observability

Request Flow Example

Step-by-Step Breakdown

Typical Response Times

Examples

Example 1: Health Check

Example 2: Get Branding Info

Example 3: Create Queue

Example 4: Access Interactive API Documentation

Context & Rationale

Why Kubernetes-Native?

Why FastAPI?

Why PostgreSQL?

Why One-Pod-Per-Queue?

Links to Related Sections

Next Steps