IVR System

The Ominis Ominis Cluster Manager includes a sophisticated IVR (Interactive Voice Response) system that enables automated call handling with menu-based navigation, text-to-speech prompts, and flexible action routing.

Architecture Overview

The IVR system follows a one-pod-per-IVR architecture, mirroring the queue management pattern. Each IVR instance runs as a dedicated Kubernetes pod containing:

FreeSWITCH - SIP/telephony engine
ESL Socket Handler - Python-based call control logic
OpenAI TTS Integration - High-quality speech generation with caching
PostgreSQL Backend - Menu configuration and TTS cache storage

Key Design Decisions

One Pod Per IVR

Each IVR instance runs in its own pod to ensure:

Isolation: One IVR's traffic doesn't affect others
Scalability: Independent resource allocation per IVR
Configuration: Each IVR has unique OpenAI API keys, languages, voices
Lifecycle: IVRs can be created/destroyed independently

ESL Socket vs mod_xml_curl

Decision: Use outbound ESL socket for IVR call control

Rationale:

Native FreeSWITCH Apps: Leverage play_and_get_digits for menu navigation (automatic retries, timeout handling)
Simplified Logic: No complex event state machines - just orchestration
Better Control: Direct call control without HTTP round-trips
Error Handling: Connection-based error detection (vs polling)

See ADR: ESL Socket vs mod_xml_curl for full analysis.

OpenAI TTS with SHA256 Caching

Decision: Use OpenAI TTS with content-addressed caching

Rationale:

Quality: Superior voice quality vs alternatives (Festival, MaryTTS, Coqui)
Cost Efficiency: SHA256-based deduplication reduces API calls
Performance: Cached audio served instantly from local filesystem
Flexibility: 6 voices, 50+ languages, multiple models

See ADR: OpenAI TTS vs Alternatives for full analysis.

Database Schema

The IVR system uses four PostgreSQL tables:

Tables

`ivrs`

Stores IVR instance configuration.

CREATE TABLE ivrs (
    id SERIAL PRIMARY KEY,
    name VARCHAR(64) UNIQUE NOT NULL,
    description TEXT,
    openai_api_key TEXT NOT NULL,
    default_language VARCHAR(10) DEFAULT 'en',
    default_voice VARCHAR(50) DEFAULT 'alloy',
    tts_model VARCHAR(50) DEFAULT 'tts-1',
    status VARCHAR(20) DEFAULT 'pending',
    pod_name VARCHAR(255),
    service_name VARCHAR(255),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

`ivr_menus`

Stores menu definitions (prompts, timeouts, retries).

CREATE TABLE ivr_menus (
    id SERIAL PRIMARY KEY,
    ivr_name VARCHAR(64) REFERENCES ivrs(name) ON DELETE CASCADE,
    menu_id VARCHAR(100) NOT NULL,
    prompt_text TEXT NOT NULL,
    tts_voice VARCHAR(50),
    timeout_seconds INTEGER DEFAULT 5,
    max_retries INTEGER DEFAULT 3,
    invalid_prompt TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(ivr_name, menu_id)
);

`ivr_menu_options`

Stores menu option actions (digit → action mapping).

CREATE TABLE ivr_menu_options (
    id SERIAL PRIMARY KEY,
    menu_id INTEGER REFERENCES ivr_menus(id) ON DELETE CASCADE,
    digit VARCHAR(10) NOT NULL,
    action_type VARCHAR(50) NOT NULL,
    action_target TEXT NOT NULL,
    action_config JSONB,
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(menu_id, digit)
);

`ivr_tts_cache`

Stores TTS cache metadata (SHA256-indexed).

CREATE TABLE ivr_tts_cache (
    id SERIAL PRIMARY KEY,
    text_hash VARCHAR(64) UNIQUE NOT NULL,
    text TEXT NOT NULL,
    language VARCHAR(10) NOT NULL,
    voice VARCHAR(50) NOT NULL,
    model VARCHAR(50) NOT NULL,
    file_path TEXT NOT NULL,
    file_size_bytes INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    access_count INTEGER DEFAULT 0
);

CREATE INDEX idx_ivr_tts_cache_hash ON ivr_tts_cache(text_hash);

API Endpoints (18 Total)

IVR Management (7 endpoints)

Create IVR

POST /v1/ivrs

Creates a new IVR instance with dedicated pod and SIP extension.

Request Body:

{
  "name": "customer-support",
  "description": "Customer support IVR with menu routing",
  "openai_api_key": "sk-...",
  "default_language": "en",
  "default_voice": "alloy",
  "tts_model": "tts-1"
}

Response (201 Created):

{
  "ivr": {
    "name": "customer-support",
    "description": "Customer support IVR with menu routing",
    "openai_api_key": "sk-...",
    "default_language": "en",
    "default_voice": "alloy",
    "tts_model": "tts-1",
    "status": "pending",
    "pod_name": "freeswitch-ivr-customer-support-abc123",
    "service_name": "freeswitch-ivr-customer-support-svc",
    "created_at": "2025-10-14T12:00:00Z",
    "updated_at": "2025-10-14T12:00:00Z"
  }
}

Process:

Create SIP extension (ivr-{name}) for authentication
Insert IVR record in database
Provision Kubernetes pod with FreeSWITCH + ESL handler
Register with SIP registrar for inbound calls

Pod Resources:

Memory: 512Mi request, 1Gi limit
CPU: 250m request, 500m limit
Volume: emptyDir for TTS cache (/var/ivr-audio)

List IVRs

GET /v1/ivrs

Response (200 OK):

{
  "ivrs": [
    {
      "name": "customer-support",
      "status": "running",
      "description": "Customer support IVR",
      "created_at": "2025-10-14T12:00:00Z"
    }
  ],
  "total": 1
}

Get IVR Details

GET /v1/ivrs/{ivr_name}

Response (200 OK):

{
  "ivr": {
    "name": "customer-support",
    "description": "Customer support IVR",
    "default_language": "en",
    "default_voice": "alloy",
    "tts_model": "tts-1",
    "status": "running",
    "pod_name": "freeswitch-ivr-customer-support-abc123",
    "created_at": "2025-10-14T12:00:00Z"
  }
}

Update IVR

PUT /v1/ivrs/{ivr_name}

Update IVR configuration (requires pod restart to apply changes).

Request Body:

{
  "description": "Updated description",
  "default_voice": "nova",
  "openai_api_key": "sk-new-key"
}

Delete IVR

DELETE /v1/ivrs/{ivr_name}

Deletes IVR instance, including pod, service, and database records.

Response: 204 No Content

Process:

Delete Kubernetes deployment and service
Delete database records (cascades to menus, options, cache)
Clean up SIP extension

Restart IVR

POST /v1/ivrs/{ivr_name}/restart

Triggers rolling restart of IVR pod (useful after config changes).

Response (200 OK):

{
  "message": "IVR customer-support restart initiated"
}

Get IVR Status

GET /v1/ivrs/{ivr_name}/status

Response (200 OK):

{
  "name": "customer-support",
  "status": "running",
  "active_calls": 3,
  "total_calls": 247,
  "pod_status": "Running",
  "health_status": "healthy"
}

POST /v1/ivrs/{ivr_name}/menus

Request Body:

{
  "menu_id": "main_menu",
  "prompt_text": "Thank you for calling. Press 1 for sales, press 2 for support, or press 3 for weather.",
  "timeout_seconds": 5,
  "max_retries": 3,
  "invalid_prompt": "Invalid selection. Please try again."
}

Response (201 Created):

{
  "menu": {
    "id": 1,
    "ivr_name": "customer-support",
    "menu_id": "main_menu",
    "prompt_text": "Thank you for calling...",
    "timeout_seconds": 5,
    "max_retries": 3,
    "created_at": "2025-10-14T12:00:00Z"
  }
}

List Menus

GET /v1/ivrs/{ivr_name}/menus

Response (200 OK):

{
  "menus": [
    {
      "id": 1,
      "menu_id": "main_menu",
      "prompt_text": "Thank you for calling...",
      "timeout_seconds": 5
    }
  ],
  "total": 1
}

GET /v1/ivrs/{ivr_name}/menus/{menu_id}

Response (200 OK):

{
  "menu": {
    "id": 1,
    "menu_id": "main_menu",
    "prompt_text": "Thank you for calling...",
    "timeout_seconds": 5
  },
  "options": [
    {
      "id": 1,
      "digit": "1",
      "action_type": "sub_menu",
      "action_target": "sales_menu",
      "description": "Navigate to sales menu"
    }
  ]
}

PUT /v1/ivrs/{ivr_name}/menus/{menu_id}

Request Body:

{
  "prompt_text": "Updated prompt text",
  "timeout_seconds": 10
}

DELETE /v1/ivrs/{ivr_name}/menus/{menu_id}

Deletes menu and all associated options.

Response: 204 No Content

POST /v1/ivrs/{ivr_name}/menus/{menu_id}/options

Request Body (Transfer to Queue):

{
  "digit": "1",
  "action_type": "transfer_queue",
  "action_target": "queue-sales",
  "description": "Transfer to sales queue"
}

Request Body (Sub-Menu):

{
  "digit": "2",
  "action_type": "sub_menu",
  "action_target": "support_menu",
  "description": "Navigate to support menu"
}

Request Body (HTTP API Call):

{
  "digit": "3",
  "action_type": "http_api_call",
  "action_target": "https://api.weather.com/current",
  "action_config": {
    "method": "GET",
    "headers": {
      "Authorization": "Bearer sk-weather-123"
    },
    "response_path": "current.temp_f",
    "tts_template": "The current temperature is {value} degrees Fahrenheit"
  },
  "description": "Get weather information"
}

PUT /v1/ivrs/{ivr_name}/menus/{menu_id}/options/{digit}

DELETE /v1/ivrs/{ivr_name}/menus/{menu_id}/options/{digit}

Response: 204 No Content

TTS & Visualization (3 endpoints)

Generate TTS Preview

POST /v1/ivrs/{ivr_name}/tts/preview

Generate TTS audio for testing without saving to production cache.

Request Body:

{
  "text": "This is a test prompt for preview",
  "voice": "nova",
  "model": "tts-1-hd"
}

Response (200 OK):

{
  "audio_file": "/var/ivr-audio/abc123def456.mp3",
  "text": "This is a test prompt for preview"
}

View TTS Cache

GET /v1/ivrs/{ivr_name}/tts/cache

Response (200 OK):

{
  "cache_entries": [
    {
      "id": 1,
      "text_hash": "abc123def456...",
      "text": "Thank you for calling",
      "language": "en",
      "voice": "alloy",
      "model": "tts-1",
      "file_path": "/var/ivr-audio/abc123def456.mp3",
      "file_size_bytes": 15234,
      "access_count": 47,
      "last_accessed_at": "2025-10-14T12:00:00Z"
    }
  ],
  "total": 1,
  "total_size_bytes": 15234
}

Clear TTS Cache

DELETE /v1/ivrs/{ivr_name}/tts/cache

Removes cached TTS audio from database and filesystem.

Response: 204 No Content

Get IVR Flow Visualization

GET /v1/ivrs/{ivr_name}/flow

Returns nested menu structure for visualization tools.

Response (200 OK):

{
  "ivr_name": "customer-support",
  "root_menu": {
    "menu_id": "main_menu",
    "prompt_text": "Press 1 for sales, 2 for support, 3 for weather",
    "options": [
      {"digit": "1", "action": "sub_menu", "target": "sales_menu"},
      {"digit": "2", "action": "transfer_queue", "target": "queue-support"},
      {"digit": "3", "action": "http_api_call", "target": "https://api.weather.com/current"}
    ],
    "sub_menus": [
      {
        "menu_id": "sales_menu",
        "prompt_text": "Press 1 for phone sales, 2 for computer sales",
        "options": [
          {"digit": "1", "action": "transfer_queue", "target": "queue-phone"},
          {"digit": "2", "action": "transfer_queue", "target": "queue-computer"}
        ],
        "sub_menus": []
      }
    ]
  }
}

IVR Call Flow

TTS Caching Flow

Cache Key Generation

import hashlib

def generate_cache_key(text: str, voice: str, language: str, model: str) -> str:
    """Generate SHA256 cache key from TTS parameters"""
    combined = f"{text}|{voice}|{language}|{model}"
    return hashlib.sha256(combined.encode("utf-8")).hexdigest()

# Example
key = generate_cache_key(
    text="Thank you for calling",
    voice="alloy",
    language="en",
    model="tts-1"
)
# Result: "abc123def456..." (64 hex characters)

Cache Benefits

Metric	Without Cache	With Cache
API Calls	Every prompt play	First time only
Latency	500-2000ms	<5ms (filesystem)
Cost	$0.015 per 1K chars	$0.015 first time, $0 after
Bandwidth	N/A	~15KB per cached prompt

Real-world Savings:

IVR with 10 prompts played 1000 times/day
Without cache: 10,000 API calls/day = $150/day
With cache: 10 API calls once = $0.01 total

ESL Socket Communication

Action Types

The IVR system supports 7 action types for menu options:

1. `transfer_queue`

Transfers call to a callcenter queue.

Example:

{
  "digit": "1",
  "action_type": "transfer_queue",
  "action_target": "queue-sales",
  "description": "Transfer to sales queue"
}

FreeSWITCH Command:

transfer queue-sales XML default

2. `transfer_extension`

Transfers call to a SIP extension.

Example:

{
  "digit": "2",
  "action_type": "transfer_extension",
  "action_target": "1001",
  "description": "Transfer to extension 1001"
}

FreeSWITCH Command:

transfer 1001 XML default

3. `sub_menu`

Navigates to a sub-menu (recursive).

Example:

{
  "digit": "3",
  "action_type": "sub_menu",
  "action_target": "sales_menu",
  "description": "Navigate to sales menu"
}

Process:

Load sales_menu from database
Generate TTS for sub-menu prompt
Execute play_and_get_digits for sub-menu
Process sub-menu option selection

4. `http_api_call`

Makes HTTP request to external API and speaks response.

Example:

{
  "digit": "4",
  "action_type": "http_api_call",
  "action_target": "https://api.weather.com/v1/current?city=Toronto",
  "action_config": {
    "method": "GET",
    "headers": {
      "Authorization": "Bearer sk-weather-api-key"
    },
    "response_path": "current.temp_f",
    "tts_template": "The current temperature in Toronto is {value} degrees Fahrenheit"
  },
  "description": "Get weather for Toronto"
}

Process:

Make HTTP request (GET/POST) with headers
Parse JSON response
Extract value using JSONPath (current.temp_f)
Format message using template ({value} placeholder)
Generate TTS for formatted message
Play TTS audio to caller

JSONPath Examples:

current.temp_f → response["current"]["temp_f"]
results[0].name → response["results"][0]["name"]
message → response["message"]

5. `hangup`

Hangs up the call.

Example:

{
  "digit": "9",
  "action_type": "hangup",
  "action_target": "NORMAL_CLEARING",
  "description": "Hang up call"
}

FreeSWITCH Command:

hangup NORMAL_CLEARING

6. `play_audio`

Plays a pre-recorded audio file.

Example:

{
  "digit": "5",
  "action_type": "play_audio",
  "action_target": "/var/audio/custom-greeting.wav",
  "description": "Play custom greeting"
}

FreeSWITCH Command:

playback /var/audio/custom-greeting.wav

7. `voicemail`

Sends call to voicemail.

Example:

{
  "digit": "0",
  "action_type": "voicemail",
  "action_target": "1001",
  "description": "Send to voicemail for extension 1001"
}

FreeSWITCH Command:

voicemail default ${domain} 1001

Complete Example: Multi-Level IVR

Let's build a complete IVR with multiple menus, HTTP API calls, and queue transfers.

Step 1: Create IVR

curl -X POST http://api:8000/v1/ivrs \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "support-hotline",
    "description": "Technical support hotline with weather and queue routing",
    "openai_api_key": "sk-your-openai-key",
    "default_language": "en",
    "default_voice": "nova",
    "tts_model": "tts-1-hd"
  }'

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "menu_id": "main_menu",
    "prompt_text": "Welcome to technical support. Press 1 for sales, press 2 for technical support, press 3 for weather information, or press 0 for operator.",
    "timeout_seconds": 8,
    "max_retries": 3,
    "invalid_prompt": "Invalid selection. Please press a number from 0 to 3."
  }'

Option 1: Sales Sub-Menu

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "digit": "1",
    "action_type": "sub_menu",
    "action_target": "sales_menu",
    "description": "Navigate to sales menu"
  }'

Option 2: Technical Support Queue

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "digit": "2",
    "action_type": "transfer_queue",
    "action_target": "queue-tech-support",
    "description": "Transfer to technical support queue"
  }'

Option 3: Weather API Call

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "digit": "3",
    "action_type": "http_api_call",
    "action_target": "https://api.weather.gov/gridpoints/TOP/31,80/forecast",
    "action_config": {
      "method": "GET",
      "headers": {
        "User-Agent": "Ominis-IVR/1.0"
      },
      "response_path": "properties.periods[0].detailedForecast",
      "tts_template": "The weather forecast is: {value}"
    },
    "description": "Get weather forecast"
  }'

Option 0: Operator Extension

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "digit": "0",
    "action_type": "transfer_extension",
    "action_target": "1000",
    "description": "Transfer to operator"
  }'

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "menu_id": "sales_menu",
    "prompt_text": "Sales department. Press 1 for phone sales, press 2 for computer sales, or press 9 to return to main menu.",
    "timeout_seconds": 5,
    "max_retries": 3
  }'

Option 1: Phone Sales Queue

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/sales_menu/options \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "digit": "1",
    "action_type": "transfer_queue",
    "action_target": "queue-phone-sales",
    "description": "Transfer to phone sales queue"
  }'

Option 2: Computer Sales Queue

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/sales_menu/options \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "digit": "2",
    "action_type": "transfer_queue",
    "action_target": "queue-computer-sales",
    "description": "Transfer to computer sales queue"
  }'

Option 9: Return to Main Menu

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/sales_menu/options \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "digit": "9",
    "action_type": "sub_menu",
    "action_target": "main_menu",
    "description": "Return to main menu"
  }'

Step 6: Visualize IVR Flow

curl http://api:8000/v1/ivrs/support-hotline/flow \
  -H "X-API-Key: your-api-key"

Response:

{
  "ivr_name": "support-hotline",
  "root_menu": {
    "menu_id": "main_menu",
    "prompt_text": "Welcome to technical support...",
    "options": [
      {"digit": "1", "action": "sub_menu", "target": "sales_menu"},
      {"digit": "2", "action": "transfer_queue", "target": "queue-tech-support"},
      {"digit": "3", "action": "http_api_call", "target": "https://api.weather.gov/..."},
      {"digit": "0", "action": "transfer_extension", "target": "1000"}
    ],
    "sub_menus": [
      {
        "menu_id": "sales_menu",
        "prompt_text": "Sales department...",
        "options": [
          {"digit": "1", "action": "transfer_queue", "target": "queue-phone-sales"},
          {"digit": "2", "action": "transfer_queue", "target": "queue-computer-sales"},
          {"digit": "9", "action": "sub_menu", "target": "main_menu"}
        ],
        "sub_menus": []
      }
    ]
  }
}

Step 7: Monitor TTS Cache

curl http://api:8000/v1/ivrs/support-hotline/tts/cache \
  -H "X-API-Key: your-api-key"

Response:

{
  "cache_entries": [
    {
      "text_hash": "abc123...",
      "text": "Welcome to technical support. Press 1 for sales...",
      "voice": "nova",
      "language": "en",
      "model": "tts-1-hd",
      "file_path": "/var/ivr-audio/abc123.mp3",
      "file_size_bytes": 24560,
      "access_count": 127,
      "last_accessed_at": "2025-10-14T15:23:10Z"
    },
    {
      "text_hash": "def456...",
      "text": "Sales department. Press 1 for phone sales...",
      "voice": "nova",
      "language": "en",
      "model": "tts-1-hd",
      "file_path": "/var/ivr-audio/def456.mp3",
      "file_size_bytes": 18920,
      "access_count": 43,
      "last_accessed_at": "2025-10-14T15:18:45Z"
    }
  ],
  "total": 2,
  "total_size_bytes": 43480
}

ADRs (Architectural Decision Records)

ADR: ESL Socket vs mod_xml_curl

Status: Accepted

Context: We need a mechanism for FreeSWITCH to execute IVR logic (menu navigation, DTMF handling, action routing). Two primary approaches exist:

mod_xml_curl: FreeSWITCH fetches dialplan XML via HTTP on each call
ESL Outbound Socket: FreeSWITCH connects to external socket server for call control

Decision: Use ESL Outbound Socket

Rationale:

Factor	mod_xml_curl	ESL Outbound Socket	Winner
Control	Limited to dialplan apps	Full call control via commands	✅ ESL
State Management	Stateless (HTTP request per step)	Stateful (persistent connection)	✅ ESL
Error Handling	HTTP timeout/retry logic	Connection-based (immediate detection)	✅ ESL
Complexity	Generate XML dialplan strings	Simple command messages	✅ ESL
Performance	HTTP overhead per action	Single TCP connection	✅ ESL
Native Apps	Can use all FreeSWITCH apps	Can use all FreeSWITCH apps	Tie
Debugging	HTTP logs + XML parsing	Clear command/response logs	✅ ESL

Key Advantage: Native FreeSWITCH Apps

ESL allows us to use play_and_get_digits, which handles:

DTMF collection
Automatic retries on invalid input
Timeout handling
Invalid prompt playback

Without play_and_get_digits (mod_xml_curl approach):

<!-- Complex state machine in dialplan XML -->
<extension name="menu_retry_1">
  <condition field="${ivr_digit}" expression="^$">
    <action application="playback" data="invalid.mp3"/>
    <action application="set" data="retry_count=1"/>
    <action application="transfer" data="menu_prompt"/>
  </condition>
</extension>

<!-- Repeat for retry 2, 3... -->

With play_and_get_digits (ESL approach):

# One line - FreeSWITCH handles all retries!
await esl.execute_app(
    "play_and_get_digits",
    "1 1 3 5000 # prompt.mp3 invalid.mp3 ivr_digit \\d+"
)

Consequences:

Positive: Simpler code, better error handling, native app usage
Negative: Requires ESL socket server (added component)
Mitigation: Socket server runs in same pod as FreeSWITCH (no network overhead)

ADR: OpenAI TTS vs Alternatives

Status: Accepted

Context: IVR system requires high-quality text-to-speech for menu prompts. Options evaluated:

OpenAI TTS (cloud API)
Google Cloud TTS (cloud API)
AWS Polly (cloud API)
Festival (open-source, on-premise)
MaryTTS (open-source, on-premise)
Coqui TTS (open-source, on-premise)

Decision: Use OpenAI TTS with SHA256 caching

Comparison:

Solution	Quality	Cost (1M chars)	Latency	Voices	Languages	Deployment
OpenAI TTS	⭐⭐⭐⭐⭐	$15	500-2000ms	6	50+	Cloud
Google Cloud TTS	⭐⭐⭐⭐⭐	$16	400-1500ms	400+	40+	Cloud
AWS Polly	⭐⭐⭐⭐	$16	500-2000ms	60+	30+	Cloud
Festival	⭐⭐	Free	50-200ms	3	15	On-premise
MaryTTS	⭐⭐⭐	Free	200-500ms	10	6	On-premise
Coqui TTS	⭐⭐⭐⭐	Free	500-3000ms	Custom	50+	On-premise

Why OpenAI TTS?

Quality: Neural voices nearly indistinguishable from human speech
Simplicity: Simple REST API, no infrastructure to manage
Cost: With caching, cost approaches zero after initial generation
Developer Experience: Easy to test, debug, and iterate

Caching Strategy: SHA256-Based Deduplication

cache_key = SHA256(text + voice + language + model)

Cache Hit Rate Analysis:

IVR with 10 unique prompts
Each prompt played 100 times/day
Cache hit rate: 99% (990/1000 requests served from cache)
API calls: 10 (first generation only)
Cost: $0.15 (vs $15 without cache)

Why Not On-Premise TTS?

Factor	On-Premise (Coqui)	OpenAI TTS	Winner
Setup Time	2-3 days (model training)	5 minutes	✅ OpenAI
Infrastructure	GPU pod required	None	✅ OpenAI
Maintenance	Model updates, debugging	None	✅ OpenAI
Quality	Good (but requires tuning)	Excellent (out-of-box)	✅ OpenAI
Cost @ 1M chars	~$50/mo (GPU pod)	$15 (API)	✅ OpenAI
Cost @ 100M chars	~$50/mo (same)	$1500 (API)	✅ On-Premise

Break-even Point: ~3M characters/month

Decision: Start with OpenAI TTS. Revisit if usage exceeds 3M chars/month.

Consequences:

Positive: High quality, low maintenance, fast development
Negative: External API dependency, recurring cost at scale
Mitigation: SHA256 caching reduces API calls by 99%

Troubleshooting

IVR Pod Not Starting

Symptoms:

Pod stuck in Pending or CrashLoopBackOff
IVR status shows pending or error

Diagnosis:

# Check pod status
kubectl get pods -n client-demo -l ivr-name=customer-support

# View pod events
kubectl describe pod -n client-demo -l ivr-name=customer-support

# Check pod logs
kubectl logs -n client-demo -l ivr-name=customer-support -c freeswitch-ivr

Common Issues:

Missing OpenAI API Key

Error: OpenAI API key not configured

Fix: Update IVR with valid OpenAI API key

Database Connection Failed
```
Error: could not connect to server: Connection refused
```
Fix: Check DB_DSN environment variable and PostgreSQL accessibility
Image Pull Error
```
Failed to pull image "registry.com/freeswitch-ivr:latest"
```
Fix: Ensure image exists and ghcr-secret is configured

TTS Generation Failing

Symptoms:

Caller hears silence instead of prompts
Logs show "OpenAI TTS API error"

Diagnosis:

# Check ESL socket handler logs
kubectl logs -n client-demo -l ivr-name=customer-support -c freeswitch-ivr | grep -i tts

# Test OpenAI API key manually
curl -X POST http://api:8000/v1/ivrs/customer-support/tts/preview \
  -H "X-API-Key: your-key" \
  -d '{"text": "Test prompt", "voice": "alloy"}'

Common Issues:

Invalid API Key

OpenAI TTS API error 401: Incorrect API key provided

Fix: Update IVR with valid OpenAI API key

Rate Limit Exceeded

OpenAI TTS API error 429: Rate limit exceeded

Fix: Check OpenAI usage limits or upgrade plan

Text Too Long

ValueError: Text too long: 5000 chars (max 4096)

Fix: Split long prompts into multiple menus

Symptoms:

DTMF digits not recognized
Call hangs up after prompt

Diagnosis:

# Check ESL socket handler logs
kubectl logs -n client-demo -l ivr-name=customer-support | grep -i "digit\|menu"

# Check FreeSWITCH logs
kubectl logs -n client-demo -l ivr-name=customer-support | grep -i "play_and_get_digits"

Common Issues:

Menu Options Not Configured

Error: Menu not found: main_menu

Fix: Create menu and options via API

Invalid Digit Mapping

Warning: Invalid digit: 5

Fix: Ensure menu option exists for pressed digit

Timeout Too Short

Info: No digit collected, hanging up

Fix: Increase timeout_seconds in menu config

HTTP API Call Failing

Symptoms:

Caller hears "service unavailable" message
Logs show HTTP error

Diagnosis:

# Check ESL socket handler logs
kubectl logs -n client-demo -l ivr-name=customer-support | grep -i "http_api_call"

Common Issues:

Invalid URL
```
Error: API call failed: 404 - Not Found
```
Fix: Verify action_target URL is correct
Missing Headers
```
Error: API call failed: 401 - Unauthorized
```
Fix: Add Authorization header to action_config
Invalid JSONPath
```
Warning: Could not extract JSON path: current.temp_f
```
Fix: Verify response_path matches API response structure

Performance Considerations

TTS Cache Sizing

Estimation:

Average prompt: 50 characters → ~5KB MP3
100 unique prompts → 500KB total
1000 unique prompts → 5MB total

Recommendations:

Small IVRs (< 20 prompts): emptyDir volume (no persistence)
Large IVRs (> 100 prompts): PersistentVolumeClaim (shared across restarts)

Cache Cleanup:

# Clear cache older than 30 days
curl -X DELETE http://api:8000/v1/ivrs/customer-support/tts/cache?older_than_days=30 \
  -H "X-API-Key: your-key"

Pod Resource Tuning

Default Resources:

Memory: 512Mi request, 1Gi limit
CPU: 250m request, 500m limit

Increase for High-Traffic IVRs:

resources:
  requests:
    memory: 1Gi
    cpu: 500m
  limits:
    memory: 2Gi
    cpu: 1000m

Monitor Usage:

kubectl top pod -n client-demo -l ivr-name=customer-support

Security Best Practices

OpenAI API Key Storage

Do:

✅ Store API keys encrypted in database
✅ Use separate API keys per IVR (blast radius)
✅ Rotate API keys regularly (quarterly)
✅ Monitor API usage via OpenAI dashboard

Don't:

❌ Hard-code API keys in code
❌ Share API keys across all IVRs
❌ Expose API keys in logs

HTTP API Call Security

Do:

✅ Use HTTPS endpoints only
✅ Validate SSL certificates
✅ Set request timeout (default: 10s)
✅ Sanitize user input before API calls

Don't:

❌ Call untrusted HTTP endpoints
❌ Expose sensitive data in TTS prompts
❌ Trust API responses without validation

Queue Management - One-pod-per-queue pattern (IVR mirrors this)
Extension Management - SIP extension creation (IVRs get extensions)
Telephony Call Control - XML-RPC commands for call control
Database Schema - Full schema including IVR tables

Summary

The IVR system provides a powerful, flexible framework for automated call handling with:

✅ 18 REST endpoints for complete IVR management
✅ One-pod-per-IVR architecture for isolation and scalability
✅ OpenAI TTS with SHA256-based caching (99% cost reduction)
✅ ESL socket handler for native FreeSWITCH app usage
✅ 7 action types (transfer, sub-menu, HTTP API, hangup, etc.)
✅ Multi-level menus with recursive navigation
✅ PostgreSQL backend for configuration and caching
✅ Kubernetes native with health checks and resource limits

The system balances developer experience (simple API, high-quality TTS) with operational efficiency (caching, resource optimization) to deliver production-ready IVR capabilities.

Architecture Overview​

Key Design Decisions​

One Pod Per IVR​

ESL Socket vs mod_xml_curl​

OpenAI TTS with SHA256 Caching​

Database Schema​

Tables​

ivrs​

ivr_menus​

ivr_menu_options​

ivr_tts_cache​

API Endpoints (18 Total)​

IVR Management (7 endpoints)​

Create IVR​

List IVRs​

Get IVR Details​

Update IVR​

Delete IVR​

Restart IVR​

Get IVR Status​

Menu Management (5 endpoints)​

Create Menu​

List Menus​

Get Menu Details​

Update Menu​

Delete Menu​

Menu Options (3 endpoints)​

Add Menu Option​

Update Menu Option​

Delete Menu Option​

TTS & Visualization (3 endpoints)​

Generate TTS Preview​

View TTS Cache​

Clear TTS Cache​

Get IVR Flow Visualization​

IVR Call Flow​

TTS Caching Flow​

Cache Key Generation​

Cache Benefits​

ESL Socket Communication​

Action Types​

1. transfer_queue​

2. transfer_extension​

3. sub_menu​

4. http_api_call​

5. hangup​

6. play_audio​

7. voicemail​

Complete Example: Multi-Level IVR​

Step 1: Create IVR​

Step 2: Create Main Menu​

Step 3: Add Main Menu Options​

Step 4: Create Sales Sub-Menu​

Step 5: Add Sales Sub-Menu Options​

Step 6: Visualize IVR Flow​

Step 7: Monitor TTS Cache​

ADRs (Architectural Decision Records)​

ADR: ESL Socket vs mod_xml_curl​

ADR: OpenAI TTS vs Alternatives​

Troubleshooting​

IVR Pod Not Starting​

TTS Generation Failing​

Menu Navigation Not Working​

HTTP API Call Failing​

Performance Considerations​

TTS Cache Sizing​

Pod Resource Tuning​

Security Best Practices​

OpenAI API Key Storage​

HTTP API Call Security​

Related Documentation​

Summary​

Architecture Overview

Key Design Decisions

One Pod Per IVR

ESL Socket vs mod_xml_curl

OpenAI TTS with SHA256 Caching

Database Schema

Tables

`ivrs`

`ivr_menus`

`ivr_menu_options`

`ivr_tts_cache`

API Endpoints (18 Total)

IVR Management (7 endpoints)

Create IVR

List IVRs

Get IVR Details

Update IVR

Delete IVR

Restart IVR

Get IVR Status

Menu Management (5 endpoints)

Create Menu

List Menus

Get Menu Details

Update Menu

Delete Menu

Menu Options (3 endpoints)

Add Menu Option

Update Menu Option

Delete Menu Option

TTS & Visualization (3 endpoints)

Generate TTS Preview

View TTS Cache

Clear TTS Cache

Get IVR Flow Visualization

IVR Call Flow

TTS Caching Flow

Cache Key Generation

Cache Benefits

ESL Socket Communication

Action Types

1. `transfer_queue`

2. `transfer_extension`

3. `sub_menu`

4. `http_api_call`

5. `hangup`

6. `play_audio`

7. `voicemail`

Complete Example: Multi-Level IVR

Step 1: Create IVR

Step 2: Create Main Menu

Step 3: Add Main Menu Options

Step 4: Create Sales Sub-Menu

Step 5: Add Sales Sub-Menu Options

Step 6: Visualize IVR Flow

Step 7: Monitor TTS Cache

ADRs (Architectural Decision Records)

ADR: ESL Socket vs mod_xml_curl

ADR: OpenAI TTS vs Alternatives

Troubleshooting

IVR Pod Not Starting

TTS Generation Failing

Menu Navigation Not Working

HTTP API Call Failing

Performance Considerations

TTS Cache Sizing

Pod Resource Tuning

Security Best Practices

OpenAI API Key Storage

HTTP API Call Security

Related Documentation

Summary