Skip to main content

IVR System

The Ominis Ominis Cluster Manager includes a sophisticated IVR (Interactive Voice Response) system that enables automated call handling with menu-based navigation, text-to-speech prompts, and flexible action routing.

Architecture Overview

The IVR system follows a one-pod-per-IVR architecture, mirroring the queue management pattern. Each IVR instance runs as a dedicated Kubernetes pod containing:

  1. FreeSWITCH - SIP/telephony engine
  2. ESL Socket Handler - Python-based call control logic
  3. OpenAI TTS Integration - High-quality speech generation with caching
  4. PostgreSQL Backend - Menu configuration and TTS cache storage

Key Design Decisions

One Pod Per IVR

Each IVR instance runs in its own pod to ensure:

  • Isolation: One IVR's traffic doesn't affect others
  • Scalability: Independent resource allocation per IVR
  • Configuration: Each IVR has unique OpenAI API keys, languages, voices
  • Lifecycle: IVRs can be created/destroyed independently

ESL Socket vs mod_xml_curl

Decision: Use outbound ESL socket for IVR call control

Rationale:

  • Native FreeSWITCH Apps: Leverage play_and_get_digits for menu navigation (automatic retries, timeout handling)
  • Simplified Logic: No complex event state machines - just orchestration
  • Better Control: Direct call control without HTTP round-trips
  • Error Handling: Connection-based error detection (vs polling)

See ADR: ESL Socket vs mod_xml_curl for full analysis.

OpenAI TTS with SHA256 Caching

Decision: Use OpenAI TTS with content-addressed caching

Rationale:

  • Quality: Superior voice quality vs alternatives (Festival, MaryTTS, Coqui)
  • Cost Efficiency: SHA256-based deduplication reduces API calls
  • Performance: Cached audio served instantly from local filesystem
  • Flexibility: 6 voices, 50+ languages, multiple models

See ADR: OpenAI TTS vs Alternatives for full analysis.

Database Schema

The IVR system uses four PostgreSQL tables:

Tables

ivrs

Stores IVR instance configuration.

CREATE TABLE ivrs (
id SERIAL PRIMARY KEY,
name VARCHAR(64) UNIQUE NOT NULL,
description TEXT,
openai_api_key TEXT NOT NULL,
default_language VARCHAR(10) DEFAULT 'en',
default_voice VARCHAR(50) DEFAULT 'alloy',
tts_model VARCHAR(50) DEFAULT 'tts-1',
status VARCHAR(20) DEFAULT 'pending',
pod_name VARCHAR(255),
service_name VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

ivr_menus

Stores menu definitions (prompts, timeouts, retries).

CREATE TABLE ivr_menus (
id SERIAL PRIMARY KEY,
ivr_name VARCHAR(64) REFERENCES ivrs(name) ON DELETE CASCADE,
menu_id VARCHAR(100) NOT NULL,
prompt_text TEXT NOT NULL,
tts_voice VARCHAR(50),
timeout_seconds INTEGER DEFAULT 5,
max_retries INTEGER DEFAULT 3,
invalid_prompt TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(ivr_name, menu_id)
);

ivr_menu_options

Stores menu option actions (digit → action mapping).

CREATE TABLE ivr_menu_options (
id SERIAL PRIMARY KEY,
menu_id INTEGER REFERENCES ivr_menus(id) ON DELETE CASCADE,
digit VARCHAR(10) NOT NULL,
action_type VARCHAR(50) NOT NULL,
action_target TEXT NOT NULL,
action_config JSONB,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(menu_id, digit)
);

ivr_tts_cache

Stores TTS cache metadata (SHA256-indexed).

CREATE TABLE ivr_tts_cache (
id SERIAL PRIMARY KEY,
text_hash VARCHAR(64) UNIQUE NOT NULL,
text TEXT NOT NULL,
language VARCHAR(10) NOT NULL,
voice VARCHAR(50) NOT NULL,
model VARCHAR(50) NOT NULL,
file_path TEXT NOT NULL,
file_size_bytes INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
access_count INTEGER DEFAULT 0
);

CREATE INDEX idx_ivr_tts_cache_hash ON ivr_tts_cache(text_hash);

API Endpoints (18 Total)

IVR Management (7 endpoints)

Create IVR

POST /v1/ivrs

Creates a new IVR instance with dedicated pod and SIP extension.

Request Body:

{
"name": "customer-support",
"description": "Customer support IVR with menu routing",
"openai_api_key": "sk-...",
"default_language": "en",
"default_voice": "alloy",
"tts_model": "tts-1"
}

Response (201 Created):

{
"ivr": {
"name": "customer-support",
"description": "Customer support IVR with menu routing",
"openai_api_key": "sk-...",
"default_language": "en",
"default_voice": "alloy",
"tts_model": "tts-1",
"status": "pending",
"pod_name": "freeswitch-ivr-customer-support-abc123",
"service_name": "freeswitch-ivr-customer-support-svc",
"created_at": "2025-10-14T12:00:00Z",
"updated_at": "2025-10-14T12:00:00Z"
}
}

Process:

  1. Create SIP extension (ivr-{name}) for authentication
  2. Insert IVR record in database
  3. Provision Kubernetes pod with FreeSWITCH + ESL handler
  4. Register with SIP registrar for inbound calls

Pod Resources:

  • Memory: 512Mi request, 1Gi limit
  • CPU: 250m request, 500m limit
  • Volume: emptyDir for TTS cache (/var/ivr-audio)

List IVRs

GET /v1/ivrs

Response (200 OK):

{
"ivrs": [
{
"name": "customer-support",
"status": "running",
"description": "Customer support IVR",
"created_at": "2025-10-14T12:00:00Z"
}
],
"total": 1
}

Get IVR Details

GET /v1/ivrs/{ivr_name}

Response (200 OK):

{
"ivr": {
"name": "customer-support",
"description": "Customer support IVR",
"default_language": "en",
"default_voice": "alloy",
"tts_model": "tts-1",
"status": "running",
"pod_name": "freeswitch-ivr-customer-support-abc123",
"created_at": "2025-10-14T12:00:00Z"
}
}

Update IVR

PUT /v1/ivrs/{ivr_name}

Update IVR configuration (requires pod restart to apply changes).

Request Body:

{
"description": "Updated description",
"default_voice": "nova",
"openai_api_key": "sk-new-key"
}

Delete IVR

DELETE /v1/ivrs/{ivr_name}

Deletes IVR instance, including pod, service, and database records.

Response: 204 No Content

Process:

  1. Delete Kubernetes deployment and service
  2. Delete database records (cascades to menus, options, cache)
  3. Clean up SIP extension

Restart IVR

POST /v1/ivrs/{ivr_name}/restart

Triggers rolling restart of IVR pod (useful after config changes).

Response (200 OK):

{
"message": "IVR customer-support restart initiated"
}

Get IVR Status

GET /v1/ivrs/{ivr_name}/status

Response (200 OK):

{
"name": "customer-support",
"status": "running",
"active_calls": 3,
"total_calls": 247,
"pod_status": "Running",
"health_status": "healthy"
}

Create Menu

POST /v1/ivrs/{ivr_name}/menus

Request Body:

{
"menu_id": "main_menu",
"prompt_text": "Thank you for calling. Press 1 for sales, press 2 for support, or press 3 for weather.",
"timeout_seconds": 5,
"max_retries": 3,
"invalid_prompt": "Invalid selection. Please try again."
}

Response (201 Created):

{
"menu": {
"id": 1,
"ivr_name": "customer-support",
"menu_id": "main_menu",
"prompt_text": "Thank you for calling...",
"timeout_seconds": 5,
"max_retries": 3,
"created_at": "2025-10-14T12:00:00Z"
}
}

List Menus

GET /v1/ivrs/{ivr_name}/menus

Response (200 OK):

{
"menus": [
{
"id": 1,
"menu_id": "main_menu",
"prompt_text": "Thank you for calling...",
"timeout_seconds": 5
}
],
"total": 1
}

Get Menu Details

GET /v1/ivrs/{ivr_name}/menus/{menu_id}

Response (200 OK):

{
"menu": {
"id": 1,
"menu_id": "main_menu",
"prompt_text": "Thank you for calling...",
"timeout_seconds": 5
},
"options": [
{
"id": 1,
"digit": "1",
"action_type": "sub_menu",
"action_target": "sales_menu",
"description": "Navigate to sales menu"
}
]
}

Update Menu

PUT /v1/ivrs/{ivr_name}/menus/{menu_id}

Request Body:

{
"prompt_text": "Updated prompt text",
"timeout_seconds": 10
}

Delete Menu

DELETE /v1/ivrs/{ivr_name}/menus/{menu_id}

Deletes menu and all associated options.

Response: 204 No Content


Add Menu Option

POST /v1/ivrs/{ivr_name}/menus/{menu_id}/options

Request Body (Transfer to Queue):

{
"digit": "1",
"action_type": "transfer_queue",
"action_target": "queue-sales",
"description": "Transfer to sales queue"
}

Request Body (Sub-Menu):

{
"digit": "2",
"action_type": "sub_menu",
"action_target": "support_menu",
"description": "Navigate to support menu"
}

Request Body (HTTP API Call):

{
"digit": "3",
"action_type": "http_api_call",
"action_target": "https://api.weather.com/current",
"action_config": {
"method": "GET",
"headers": {
"Authorization": "Bearer sk-weather-123"
},
"response_path": "current.temp_f",
"tts_template": "The current temperature is {value} degrees Fahrenheit"
},
"description": "Get weather information"
}

Update Menu Option

PUT /v1/ivrs/{ivr_name}/menus/{menu_id}/options/{digit}

Delete Menu Option

DELETE /v1/ivrs/{ivr_name}/menus/{menu_id}/options/{digit}

Response: 204 No Content


TTS & Visualization (3 endpoints)

Generate TTS Preview

POST /v1/ivrs/{ivr_name}/tts/preview

Generate TTS audio for testing without saving to production cache.

Request Body:

{
"text": "This is a test prompt for preview",
"voice": "nova",
"model": "tts-1-hd"
}

Response (200 OK):

{
"audio_file": "/var/ivr-audio/abc123def456.mp3",
"text": "This is a test prompt for preview"
}

View TTS Cache

GET /v1/ivrs/{ivr_name}/tts/cache

Response (200 OK):

{
"cache_entries": [
{
"id": 1,
"text_hash": "abc123def456...",
"text": "Thank you for calling",
"language": "en",
"voice": "alloy",
"model": "tts-1",
"file_path": "/var/ivr-audio/abc123def456.mp3",
"file_size_bytes": 15234,
"access_count": 47,
"last_accessed_at": "2025-10-14T12:00:00Z"
}
],
"total": 1,
"total_size_bytes": 15234
}

Clear TTS Cache

DELETE /v1/ivrs/{ivr_name}/tts/cache

Removes cached TTS audio from database and filesystem.

Response: 204 No Content


Get IVR Flow Visualization

GET /v1/ivrs/{ivr_name}/flow

Returns nested menu structure for visualization tools.

Response (200 OK):

{
"ivr_name": "customer-support",
"root_menu": {
"menu_id": "main_menu",
"prompt_text": "Press 1 for sales, 2 for support, 3 for weather",
"options": [
{"digit": "1", "action": "sub_menu", "target": "sales_menu"},
{"digit": "2", "action": "transfer_queue", "target": "queue-support"},
{"digit": "3", "action": "http_api_call", "target": "https://api.weather.com/current"}
],
"sub_menus": [
{
"menu_id": "sales_menu",
"prompt_text": "Press 1 for phone sales, 2 for computer sales",
"options": [
{"digit": "1", "action": "transfer_queue", "target": "queue-phone"},
{"digit": "2", "action": "transfer_queue", "target": "queue-computer"}
],
"sub_menus": []
}
]
}
}

IVR Call Flow


TTS Caching Flow

Cache Key Generation

import hashlib

def generate_cache_key(text: str, voice: str, language: str, model: str) -> str:
"""Generate SHA256 cache key from TTS parameters"""
combined = f"{text}|{voice}|{language}|{model}"
return hashlib.sha256(combined.encode("utf-8")).hexdigest()

# Example
key = generate_cache_key(
text="Thank you for calling",
voice="alloy",
language="en",
model="tts-1"
)
# Result: "abc123def456..." (64 hex characters)

Cache Benefits

MetricWithout CacheWith Cache
API CallsEvery prompt playFirst time only
Latency500-2000ms<5ms (filesystem)
Cost$0.015 per 1K chars$0.015 first time, $0 after
BandwidthN/A~15KB per cached prompt

Real-world Savings:

  • IVR with 10 prompts played 1000 times/day
  • Without cache: 10,000 API calls/day = $150/day
  • With cache: 10 API calls once = $0.01 total

ESL Socket Communication


Action Types

The IVR system supports 7 action types for menu options:

1. transfer_queue

Transfers call to a callcenter queue.

Example:

{
"digit": "1",
"action_type": "transfer_queue",
"action_target": "queue-sales",
"description": "Transfer to sales queue"
}

FreeSWITCH Command:

transfer queue-sales XML default

2. transfer_extension

Transfers call to a SIP extension.

Example:

{
"digit": "2",
"action_type": "transfer_extension",
"action_target": "1001",
"description": "Transfer to extension 1001"
}

FreeSWITCH Command:

transfer 1001 XML default

3. sub_menu

Navigates to a sub-menu (recursive).

Example:

{
"digit": "3",
"action_type": "sub_menu",
"action_target": "sales_menu",
"description": "Navigate to sales menu"
}

Process:

  1. Load sales_menu from database
  2. Generate TTS for sub-menu prompt
  3. Execute play_and_get_digits for sub-menu
  4. Process sub-menu option selection

4. http_api_call

Makes HTTP request to external API and speaks response.

Example:

{
"digit": "4",
"action_type": "http_api_call",
"action_target": "https://api.weather.com/v1/current?city=Toronto",
"action_config": {
"method": "GET",
"headers": {
"Authorization": "Bearer sk-weather-api-key"
},
"response_path": "current.temp_f",
"tts_template": "The current temperature in Toronto is {value} degrees Fahrenheit"
},
"description": "Get weather for Toronto"
}

Process:

  1. Make HTTP request (GET/POST) with headers
  2. Parse JSON response
  3. Extract value using JSONPath (current.temp_f)
  4. Format message using template ({value} placeholder)
  5. Generate TTS for formatted message
  6. Play TTS audio to caller

JSONPath Examples:

  • current.temp_fresponse["current"]["temp_f"]
  • results[0].nameresponse["results"][0]["name"]
  • messageresponse["message"]

5. hangup

Hangs up the call.

Example:

{
"digit": "9",
"action_type": "hangup",
"action_target": "NORMAL_CLEARING",
"description": "Hang up call"
}

FreeSWITCH Command:

hangup NORMAL_CLEARING

6. play_audio

Plays a pre-recorded audio file.

Example:

{
"digit": "5",
"action_type": "play_audio",
"action_target": "/var/audio/custom-greeting.wav",
"description": "Play custom greeting"
}

FreeSWITCH Command:

playback /var/audio/custom-greeting.wav

7. voicemail

Sends call to voicemail.

Example:

{
"digit": "0",
"action_type": "voicemail",
"action_target": "1001",
"description": "Send to voicemail for extension 1001"
}

FreeSWITCH Command:

voicemail default ${domain} 1001

Complete Example: Multi-Level IVR

Let's build a complete IVR with multiple menus, HTTP API calls, and queue transfers.

Step 1: Create IVR

curl -X POST http://api:8000/v1/ivrs \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"name": "support-hotline",
"description": "Technical support hotline with weather and queue routing",
"openai_api_key": "sk-your-openai-key",
"default_language": "en",
"default_voice": "nova",
"tts_model": "tts-1-hd"
}'

Step 2: Create Main Menu

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"menu_id": "main_menu",
"prompt_text": "Welcome to technical support. Press 1 for sales, press 2 for technical support, press 3 for weather information, or press 0 for operator.",
"timeout_seconds": 8,
"max_retries": 3,
"invalid_prompt": "Invalid selection. Please press a number from 0 to 3."
}'

Step 3: Add Main Menu Options

Option 1: Sales Sub-Menu

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "1",
"action_type": "sub_menu",
"action_target": "sales_menu",
"description": "Navigate to sales menu"
}'

Option 2: Technical Support Queue

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "2",
"action_type": "transfer_queue",
"action_target": "queue-tech-support",
"description": "Transfer to technical support queue"
}'

Option 3: Weather API Call

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "3",
"action_type": "http_api_call",
"action_target": "https://api.weather.gov/gridpoints/TOP/31,80/forecast",
"action_config": {
"method": "GET",
"headers": {
"User-Agent": "Ominis-IVR/1.0"
},
"response_path": "properties.periods[0].detailedForecast",
"tts_template": "The weather forecast is: {value}"
},
"description": "Get weather forecast"
}'

Option 0: Operator Extension

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "0",
"action_type": "transfer_extension",
"action_target": "1000",
"description": "Transfer to operator"
}'

Step 4: Create Sales Sub-Menu

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"menu_id": "sales_menu",
"prompt_text": "Sales department. Press 1 for phone sales, press 2 for computer sales, or press 9 to return to main menu.",
"timeout_seconds": 5,
"max_retries": 3
}'

Step 5: Add Sales Sub-Menu Options

Option 1: Phone Sales Queue

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/sales_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "1",
"action_type": "transfer_queue",
"action_target": "queue-phone-sales",
"description": "Transfer to phone sales queue"
}'

Option 2: Computer Sales Queue

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/sales_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "2",
"action_type": "transfer_queue",
"action_target": "queue-computer-sales",
"description": "Transfer to computer sales queue"
}'

Option 9: Return to Main Menu

curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/sales_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "9",
"action_type": "sub_menu",
"action_target": "main_menu",
"description": "Return to main menu"
}'

Step 6: Visualize IVR Flow

curl http://api:8000/v1/ivrs/support-hotline/flow \
-H "X-API-Key: your-api-key"

Response:

{
"ivr_name": "support-hotline",
"root_menu": {
"menu_id": "main_menu",
"prompt_text": "Welcome to technical support...",
"options": [
{"digit": "1", "action": "sub_menu", "target": "sales_menu"},
{"digit": "2", "action": "transfer_queue", "target": "queue-tech-support"},
{"digit": "3", "action": "http_api_call", "target": "https://api.weather.gov/..."},
{"digit": "0", "action": "transfer_extension", "target": "1000"}
],
"sub_menus": [
{
"menu_id": "sales_menu",
"prompt_text": "Sales department...",
"options": [
{"digit": "1", "action": "transfer_queue", "target": "queue-phone-sales"},
{"digit": "2", "action": "transfer_queue", "target": "queue-computer-sales"},
{"digit": "9", "action": "sub_menu", "target": "main_menu"}
],
"sub_menus": []
}
]
}
}

Step 7: Monitor TTS Cache

curl http://api:8000/v1/ivrs/support-hotline/tts/cache \
-H "X-API-Key: your-api-key"

Response:

{
"cache_entries": [
{
"text_hash": "abc123...",
"text": "Welcome to technical support. Press 1 for sales...",
"voice": "nova",
"language": "en",
"model": "tts-1-hd",
"file_path": "/var/ivr-audio/abc123.mp3",
"file_size_bytes": 24560,
"access_count": 127,
"last_accessed_at": "2025-10-14T15:23:10Z"
},
{
"text_hash": "def456...",
"text": "Sales department. Press 1 for phone sales...",
"voice": "nova",
"language": "en",
"model": "tts-1-hd",
"file_path": "/var/ivr-audio/def456.mp3",
"file_size_bytes": 18920,
"access_count": 43,
"last_accessed_at": "2025-10-14T15:18:45Z"
}
],
"total": 2,
"total_size_bytes": 43480
}

ADRs (Architectural Decision Records)

ADR: ESL Socket vs mod_xml_curl

Status: Accepted

Context: We need a mechanism for FreeSWITCH to execute IVR logic (menu navigation, DTMF handling, action routing). Two primary approaches exist:

  1. mod_xml_curl: FreeSWITCH fetches dialplan XML via HTTP on each call
  2. ESL Outbound Socket: FreeSWITCH connects to external socket server for call control

Decision: Use ESL Outbound Socket

Rationale:

Factormod_xml_curlESL Outbound SocketWinner
ControlLimited to dialplan appsFull call control via commands✅ ESL
State ManagementStateless (HTTP request per step)Stateful (persistent connection)✅ ESL
Error HandlingHTTP timeout/retry logicConnection-based (immediate detection)✅ ESL
ComplexityGenerate XML dialplan stringsSimple command messages✅ ESL
PerformanceHTTP overhead per actionSingle TCP connection✅ ESL
Native AppsCan use all FreeSWITCH appsCan use all FreeSWITCH appsTie
DebuggingHTTP logs + XML parsingClear command/response logs✅ ESL

Key Advantage: Native FreeSWITCH Apps

ESL allows us to use play_and_get_digits, which handles:

  • DTMF collection
  • Automatic retries on invalid input
  • Timeout handling
  • Invalid prompt playback

Without play_and_get_digits (mod_xml_curl approach):

<!-- Complex state machine in dialplan XML -->
<extension name="menu_retry_1">
<condition field="${ivr_digit}" expression="^$">
<action application="playback" data="invalid.mp3"/>
<action application="set" data="retry_count=1"/>
<action application="transfer" data="menu_prompt"/>
</condition>
</extension>

<!-- Repeat for retry 2, 3... -->

With play_and_get_digits (ESL approach):

# One line - FreeSWITCH handles all retries!
await esl.execute_app(
"play_and_get_digits",
"1 1 3 5000 # prompt.mp3 invalid.mp3 ivr_digit \\d+"
)

Consequences:

  • Positive: Simpler code, better error handling, native app usage
  • Negative: Requires ESL socket server (added component)
  • Mitigation: Socket server runs in same pod as FreeSWITCH (no network overhead)

ADR: OpenAI TTS vs Alternatives

Status: Accepted

Context: IVR system requires high-quality text-to-speech for menu prompts. Options evaluated:

  1. OpenAI TTS (cloud API)
  2. Google Cloud TTS (cloud API)
  3. AWS Polly (cloud API)
  4. Festival (open-source, on-premise)
  5. MaryTTS (open-source, on-premise)
  6. Coqui TTS (open-source, on-premise)

Decision: Use OpenAI TTS with SHA256 caching

Comparison:

SolutionQualityCost (1M chars)LatencyVoicesLanguagesDeployment
OpenAI TTS⭐⭐⭐⭐⭐$15500-2000ms650+Cloud
Google Cloud TTS⭐⭐⭐⭐⭐$16400-1500ms400+40+Cloud
AWS Polly⭐⭐⭐⭐$16500-2000ms60+30+Cloud
Festival⭐⭐Free50-200ms315On-premise
MaryTTS⭐⭐⭐Free200-500ms106On-premise
Coqui TTS⭐⭐⭐⭐Free500-3000msCustom50+On-premise

Why OpenAI TTS?

  1. Quality: Neural voices nearly indistinguishable from human speech
  2. Simplicity: Simple REST API, no infrastructure to manage
  3. Cost: With caching, cost approaches zero after initial generation
  4. Developer Experience: Easy to test, debug, and iterate

Caching Strategy: SHA256-Based Deduplication

cache_key = SHA256(text + voice + language + model)

Cache Hit Rate Analysis:

  • IVR with 10 unique prompts
  • Each prompt played 100 times/day
  • Cache hit rate: 99% (990/1000 requests served from cache)
  • API calls: 10 (first generation only)
  • Cost: $0.15 (vs $15 without cache)

Why Not On-Premise TTS?

FactorOn-Premise (Coqui)OpenAI TTSWinner
Setup Time2-3 days (model training)5 minutes✅ OpenAI
InfrastructureGPU pod requiredNone✅ OpenAI
MaintenanceModel updates, debuggingNone✅ OpenAI
QualityGood (but requires tuning)Excellent (out-of-box)✅ OpenAI
Cost @ 1M chars~$50/mo (GPU pod)$15 (API)✅ OpenAI
Cost @ 100M chars~$50/mo (same)$1500 (API)✅ On-Premise

Break-even Point: ~3M characters/month

Decision: Start with OpenAI TTS. Revisit if usage exceeds 3M chars/month.

Consequences:

  • Positive: High quality, low maintenance, fast development
  • Negative: External API dependency, recurring cost at scale
  • Mitigation: SHA256 caching reduces API calls by 99%

Troubleshooting

IVR Pod Not Starting

Symptoms:

  • Pod stuck in Pending or CrashLoopBackOff
  • IVR status shows pending or error

Diagnosis:

# Check pod status
kubectl get pods -n client-demo -l ivr-name=customer-support

# View pod events
kubectl describe pod -n client-demo -l ivr-name=customer-support

# Check pod logs
kubectl logs -n client-demo -l ivr-name=customer-support -c freeswitch-ivr

Common Issues:

  1. Missing OpenAI API Key
Error: OpenAI API key not configured

Fix: Update IVR with valid OpenAI API key

  1. Database Connection Failed

    Error: could not connect to server: Connection refused

    Fix: Check DB_DSN environment variable and PostgreSQL accessibility

  2. Image Pull Error

    Failed to pull image "registry.com/freeswitch-ivr:latest"

    Fix: Ensure image exists and ghcr-secret is configured


TTS Generation Failing

Symptoms:

  • Caller hears silence instead of prompts
  • Logs show "OpenAI TTS API error"

Diagnosis:

# Check ESL socket handler logs
kubectl logs -n client-demo -l ivr-name=customer-support -c freeswitch-ivr | grep -i tts

# Test OpenAI API key manually
curl -X POST http://api:8000/v1/ivrs/customer-support/tts/preview \
-H "X-API-Key: your-key" \
-d '{"text": "Test prompt", "voice": "alloy"}'

Common Issues:

  1. Invalid API Key
OpenAI TTS API error 401: Incorrect API key provided

Fix: Update IVR with valid OpenAI API key

  1. Rate Limit Exceeded
OpenAI TTS API error 429: Rate limit exceeded

Fix: Check OpenAI usage limits or upgrade plan

  1. Text Too Long
ValueError: Text too long: 5000 chars (max 4096)

Fix: Split long prompts into multiple menus


Symptoms:

  • DTMF digits not recognized
  • Call hangs up after prompt

Diagnosis:

# Check ESL socket handler logs
kubectl logs -n client-demo -l ivr-name=customer-support | grep -i "digit\|menu"

# Check FreeSWITCH logs
kubectl logs -n client-demo -l ivr-name=customer-support | grep -i "play_and_get_digits"

Common Issues:

  1. Menu Options Not Configured
Error: Menu not found: main_menu

Fix: Create menu and options via API

  1. Invalid Digit Mapping
Warning: Invalid digit: 5

Fix: Ensure menu option exists for pressed digit

  1. Timeout Too Short
Info: No digit collected, hanging up

Fix: Increase timeout_seconds in menu config


HTTP API Call Failing

Symptoms:

  • Caller hears "service unavailable" message
  • Logs show HTTP error

Diagnosis:

# Check ESL socket handler logs
kubectl logs -n client-demo -l ivr-name=customer-support | grep -i "http_api_call"

Common Issues:

  1. Invalid URL

    Error: API call failed: 404 - Not Found

    Fix: Verify action_target URL is correct

  2. Missing Headers

    Error: API call failed: 401 - Unauthorized

    Fix: Add Authorization header to action_config

  3. Invalid JSONPath

    Warning: Could not extract JSON path: current.temp_f

    Fix: Verify response_path matches API response structure


Performance Considerations

TTS Cache Sizing

Estimation:

  • Average prompt: 50 characters → ~5KB MP3
  • 100 unique prompts → 500KB total
  • 1000 unique prompts → 5MB total

Recommendations:

  • Small IVRs (< 20 prompts): emptyDir volume (no persistence)
  • Large IVRs (> 100 prompts): PersistentVolumeClaim (shared across restarts)

Cache Cleanup:

# Clear cache older than 30 days
curl -X DELETE http://api:8000/v1/ivrs/customer-support/tts/cache?older_than_days=30 \
-H "X-API-Key: your-key"

Pod Resource Tuning

Default Resources:

  • Memory: 512Mi request, 1Gi limit
  • CPU: 250m request, 500m limit

Increase for High-Traffic IVRs:

resources:
requests:
memory: 1Gi
cpu: 500m
limits:
memory: 2Gi
cpu: 1000m

Monitor Usage:

kubectl top pod -n client-demo -l ivr-name=customer-support

Security Best Practices

OpenAI API Key Storage

Do:

  • ✅ Store API keys encrypted in database
  • ✅ Use separate API keys per IVR (blast radius)
  • ✅ Rotate API keys regularly (quarterly)
  • ✅ Monitor API usage via OpenAI dashboard

Don't:

  • ❌ Hard-code API keys in code
  • ❌ Share API keys across all IVRs
  • ❌ Expose API keys in logs

HTTP API Call Security

Do:

  • ✅ Use HTTPS endpoints only
  • ✅ Validate SSL certificates
  • ✅ Set request timeout (default: 10s)
  • ✅ Sanitize user input before API calls

Don't:

  • ❌ Call untrusted HTTP endpoints
  • ❌ Expose sensitive data in TTS prompts
  • ❌ Trust API responses without validation


Summary

The IVR system provides a powerful, flexible framework for automated call handling with:

  • 18 REST endpoints for complete IVR management
  • One-pod-per-IVR architecture for isolation and scalability
  • OpenAI TTS with SHA256-based caching (99% cost reduction)
  • ESL socket handler for native FreeSWITCH app usage
  • 7 action types (transfer, sub-menu, HTTP API, hangup, etc.)
  • Multi-level menus with recursive navigation
  • PostgreSQL backend for configuration and caching
  • Kubernetes native with health checks and resource limits

The system balances developer experience (simple API, high-quality TTS) with operational efficiency (caching, resource optimization) to deliver production-ready IVR capabilities.