IVR System
The Ominis Ominis Cluster Manager includes a sophisticated IVR (Interactive Voice Response) system that enables automated call handling with menu-based navigation, text-to-speech prompts, and flexible action routing.
Architecture Overview
The IVR system follows a one-pod-per-IVR architecture, mirroring the queue management pattern. Each IVR instance runs as a dedicated Kubernetes pod containing:
- FreeSWITCH - SIP/telephony engine
- ESL Socket Handler - Python-based call control logic
- OpenAI TTS Integration - High-quality speech generation with caching
- PostgreSQL Backend - Menu configuration and TTS cache storage
Key Design Decisions
One Pod Per IVR
Each IVR instance runs in its own pod to ensure:
- Isolation: One IVR's traffic doesn't affect others
- Scalability: Independent resource allocation per IVR
- Configuration: Each IVR has unique OpenAI API keys, languages, voices
- Lifecycle: IVRs can be created/destroyed independently
ESL Socket vs mod_xml_curl
Decision: Use outbound ESL socket for IVR call control
Rationale:
- Native FreeSWITCH Apps: Leverage
play_and_get_digitsfor menu navigation (automatic retries, timeout handling) - Simplified Logic: No complex event state machines - just orchestration
- Better Control: Direct call control without HTTP round-trips
- Error Handling: Connection-based error detection (vs polling)
See ADR: ESL Socket vs mod_xml_curl for full analysis.
OpenAI TTS with SHA256 Caching
Decision: Use OpenAI TTS with content-addressed caching
Rationale:
- Quality: Superior voice quality vs alternatives (Festival, MaryTTS, Coqui)
- Cost Efficiency: SHA256-based deduplication reduces API calls
- Performance: Cached audio served instantly from local filesystem
- Flexibility: 6 voices, 50+ languages, multiple models
See ADR: OpenAI TTS vs Alternatives for full analysis.
Database Schema
The IVR system uses four PostgreSQL tables:
Tables
ivrs
Stores IVR instance configuration.
CREATE TABLE ivrs (
id SERIAL PRIMARY KEY,
name VARCHAR(64) UNIQUE NOT NULL,
description TEXT,
openai_api_key TEXT NOT NULL,
default_language VARCHAR(10) DEFAULT 'en',
default_voice VARCHAR(50) DEFAULT 'alloy',
tts_model VARCHAR(50) DEFAULT 'tts-1',
status VARCHAR(20) DEFAULT 'pending',
pod_name VARCHAR(255),
service_name VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
ivr_menus
Stores menu definitions (prompts, timeouts, retries).
CREATE TABLE ivr_menus (
id SERIAL PRIMARY KEY,
ivr_name VARCHAR(64) REFERENCES ivrs(name) ON DELETE CASCADE,
menu_id VARCHAR(100) NOT NULL,
prompt_text TEXT NOT NULL,
tts_voice VARCHAR(50),
timeout_seconds INTEGER DEFAULT 5,
max_retries INTEGER DEFAULT 3,
invalid_prompt TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(ivr_name, menu_id)
);
ivr_menu_options
Stores menu option actions (digit → action mapping).
CREATE TABLE ivr_menu_options (
id SERIAL PRIMARY KEY,
menu_id INTEGER REFERENCES ivr_menus(id) ON DELETE CASCADE,
digit VARCHAR(10) NOT NULL,
action_type VARCHAR(50) NOT NULL,
action_target TEXT NOT NULL,
action_config JSONB,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(menu_id, digit)
);
ivr_tts_cache
Stores TTS cache metadata (SHA256-indexed).
CREATE TABLE ivr_tts_cache (
id SERIAL PRIMARY KEY,
text_hash VARCHAR(64) UNIQUE NOT NULL,
text TEXT NOT NULL,
language VARCHAR(10) NOT NULL,
voice VARCHAR(50) NOT NULL,
model VARCHAR(50) NOT NULL,
file_path TEXT NOT NULL,
file_size_bytes INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
access_count INTEGER DEFAULT 0
);
CREATE INDEX idx_ivr_tts_cache_hash ON ivr_tts_cache(text_hash);
API Endpoints (18 Total)
IVR Management (7 endpoints)
Create IVR
POST /v1/ivrs
Creates a new IVR instance with dedicated pod and SIP extension.
Request Body:
{
"name": "customer-support",
"description": "Customer support IVR with menu routing",
"openai_api_key": "sk-...",
"default_language": "en",
"default_voice": "alloy",
"tts_model": "tts-1"
}
Response (201 Created):
{
"ivr": {
"name": "customer-support",
"description": "Customer support IVR with menu routing",
"openai_api_key": "sk-...",
"default_language": "en",
"default_voice": "alloy",
"tts_model": "tts-1",
"status": "pending",
"pod_name": "freeswitch-ivr-customer-support-abc123",
"service_name": "freeswitch-ivr-customer-support-svc",
"created_at": "2025-10-14T12:00:00Z",
"updated_at": "2025-10-14T12:00:00Z"
}
}
Process:
- Create SIP extension (
ivr-{name}) for authentication - Insert IVR record in database
- Provision Kubernetes pod with FreeSWITCH + ESL handler
- Register with SIP registrar for inbound calls
Pod Resources:
- Memory: 512Mi request, 1Gi limit
- CPU: 250m request, 500m limit
- Volume: emptyDir for TTS cache (
/var/ivr-audio)
List IVRs
GET /v1/ivrs
Response (200 OK):
{
"ivrs": [
{
"name": "customer-support",
"status": "running",
"description": "Customer support IVR",
"created_at": "2025-10-14T12:00:00Z"
}
],
"total": 1
}
Get IVR Details
GET /v1/ivrs/{ivr_name}
Response (200 OK):
{
"ivr": {
"name": "customer-support",
"description": "Customer support IVR",
"default_language": "en",
"default_voice": "alloy",
"tts_model": "tts-1",
"status": "running",
"pod_name": "freeswitch-ivr-customer-support-abc123",
"created_at": "2025-10-14T12:00:00Z"
}
}
Update IVR
PUT /v1/ivrs/{ivr_name}
Update IVR configuration (requires pod restart to apply changes).
Request Body:
{
"description": "Updated description",
"default_voice": "nova",
"openai_api_key": "sk-new-key"
}
Delete IVR
DELETE /v1/ivrs/{ivr_name}
Deletes IVR instance, including pod, service, and database records.
Response: 204 No Content
Process:
- Delete Kubernetes deployment and service
- Delete database records (cascades to menus, options, cache)
- Clean up SIP extension
Restart IVR
POST /v1/ivrs/{ivr_name}/restart
Triggers rolling restart of IVR pod (useful after config changes).
Response (200 OK):
{
"message": "IVR customer-support restart initiated"
}
Get IVR Status
GET /v1/ivrs/{ivr_name}/status
Response (200 OK):
{
"name": "customer-support",
"status": "running",
"active_calls": 3,
"total_calls": 247,
"pod_status": "Running",
"health_status": "healthy"
}
Menu Management (5 endpoints)
Create Menu
POST /v1/ivrs/{ivr_name}/menus
Request Body:
{
"menu_id": "main_menu",
"prompt_text": "Thank you for calling. Press 1 for sales, press 2 for support, or press 3 for weather.",
"timeout_seconds": 5,
"max_retries": 3,
"invalid_prompt": "Invalid selection. Please try again."
}
Response (201 Created):
{
"menu": {
"id": 1,
"ivr_name": "customer-support",
"menu_id": "main_menu",
"prompt_text": "Thank you for calling...",
"timeout_seconds": 5,
"max_retries": 3,
"created_at": "2025-10-14T12:00:00Z"
}
}
List Menus
GET /v1/ivrs/{ivr_name}/menus
Response (200 OK):
{
"menus": [
{
"id": 1,
"menu_id": "main_menu",
"prompt_text": "Thank you for calling...",
"timeout_seconds": 5
}
],
"total": 1
}
Get Menu Details
GET /v1/ivrs/{ivr_name}/menus/{menu_id}
Response (200 OK):
{
"menu": {
"id": 1,
"menu_id": "main_menu",
"prompt_text": "Thank you for calling...",
"timeout_seconds": 5
},
"options": [
{
"id": 1,
"digit": "1",
"action_type": "sub_menu",
"action_target": "sales_menu",
"description": "Navigate to sales menu"
}
]
}
Update Menu
PUT /v1/ivrs/{ivr_name}/menus/{menu_id}
Request Body:
{
"prompt_text": "Updated prompt text",
"timeout_seconds": 10
}
Delete Menu
DELETE /v1/ivrs/{ivr_name}/menus/{menu_id}
Deletes menu and all associated options.
Response: 204 No Content
Menu Options (3 endpoints)
Add Menu Option
POST /v1/ivrs/{ivr_name}/menus/{menu_id}/options
Request Body (Transfer to Queue):
{
"digit": "1",
"action_type": "transfer_queue",
"action_target": "queue-sales",
"description": "Transfer to sales queue"
}
Request Body (Sub-Menu):
{
"digit": "2",
"action_type": "sub_menu",
"action_target": "support_menu",
"description": "Navigate to support menu"
}
Request Body (HTTP API Call):
{
"digit": "3",
"action_type": "http_api_call",
"action_target": "https://api.weather.com/current",
"action_config": {
"method": "GET",
"headers": {
"Authorization": "Bearer sk-weather-123"
},
"response_path": "current.temp_f",
"tts_template": "The current temperature is {value} degrees Fahrenheit"
},
"description": "Get weather information"
}
Update Menu Option
PUT /v1/ivrs/{ivr_name}/menus/{menu_id}/options/{digit}
Delete Menu Option
DELETE /v1/ivrs/{ivr_name}/menus/{menu_id}/options/{digit}
Response: 204 No Content
TTS & Visualization (3 endpoints)
Generate TTS Preview
POST /v1/ivrs/{ivr_name}/tts/preview
Generate TTS audio for testing without saving to production cache.
Request Body:
{
"text": "This is a test prompt for preview",
"voice": "nova",
"model": "tts-1-hd"
}
Response (200 OK):
{
"audio_file": "/var/ivr-audio/abc123def456.mp3",
"text": "This is a test prompt for preview"
}
View TTS Cache
GET /v1/ivrs/{ivr_name}/tts/cache
Response (200 OK):
{
"cache_entries": [
{
"id": 1,
"text_hash": "abc123def456...",
"text": "Thank you for calling",
"language": "en",
"voice": "alloy",
"model": "tts-1",
"file_path": "/var/ivr-audio/abc123def456.mp3",
"file_size_bytes": 15234,
"access_count": 47,
"last_accessed_at": "2025-10-14T12:00:00Z"
}
],
"total": 1,
"total_size_bytes": 15234
}
Clear TTS Cache
DELETE /v1/ivrs/{ivr_name}/tts/cache
Removes cached TTS audio from database and filesystem.
Response: 204 No Content
Get IVR Flow Visualization
GET /v1/ivrs/{ivr_name}/flow
Returns nested menu structure for visualization tools.
Response (200 OK):
{
"ivr_name": "customer-support",
"root_menu": {
"menu_id": "main_menu",
"prompt_text": "Press 1 for sales, 2 for support, 3 for weather",
"options": [
{"digit": "1", "action": "sub_menu", "target": "sales_menu"},
{"digit": "2", "action": "transfer_queue", "target": "queue-support"},
{"digit": "3", "action": "http_api_call", "target": "https://api.weather.com/current"}
],
"sub_menus": [
{
"menu_id": "sales_menu",
"prompt_text": "Press 1 for phone sales, 2 for computer sales",
"options": [
{"digit": "1", "action": "transfer_queue", "target": "queue-phone"},
{"digit": "2", "action": "transfer_queue", "target": "queue-computer"}
],
"sub_menus": []
}
]
}
}
IVR Call Flow
TTS Caching Flow
Cache Key Generation
import hashlib
def generate_cache_key(text: str, voice: str, language: str, model: str) -> str:
"""Generate SHA256 cache key from TTS parameters"""
combined = f"{text}|{voice}|{language}|{model}"
return hashlib.sha256(combined.encode("utf-8")).hexdigest()
# Example
key = generate_cache_key(
text="Thank you for calling",
voice="alloy",
language="en",
model="tts-1"
)
# Result: "abc123def456..." (64 hex characters)
Cache Benefits
| Metric | Without Cache | With Cache |
|---|---|---|
| API Calls | Every prompt play | First time only |
| Latency | 500-2000ms | <5ms (filesystem) |
| Cost | $0.015 per 1K chars | $0.015 first time, $0 after |
| Bandwidth | N/A | ~15KB per cached prompt |
Real-world Savings:
- IVR with 10 prompts played 1000 times/day
- Without cache: 10,000 API calls/day = $150/day
- With cache: 10 API calls once = $0.01 total
ESL Socket Communication
Action Types
The IVR system supports 7 action types for menu options:
1. transfer_queue
Transfers call to a callcenter queue.
Example:
{
"digit": "1",
"action_type": "transfer_queue",
"action_target": "queue-sales",
"description": "Transfer to sales queue"
}
FreeSWITCH Command:
transfer queue-sales XML default
2. transfer_extension
Transfers call to a SIP extension.
Example:
{
"digit": "2",
"action_type": "transfer_extension",
"action_target": "1001",
"description": "Transfer to extension 1001"
}
FreeSWITCH Command:
transfer 1001 XML default
3. sub_menu
Navigates to a sub-menu (recursive).
Example:
{
"digit": "3",
"action_type": "sub_menu",
"action_target": "sales_menu",
"description": "Navigate to sales menu"
}
Process:
- Load
sales_menufrom database - Generate TTS for sub-menu prompt
- Execute
play_and_get_digitsfor sub-menu - Process sub-menu option selection
4. http_api_call
Makes HTTP request to external API and speaks response.
Example:
{
"digit": "4",
"action_type": "http_api_call",
"action_target": "https://api.weather.com/v1/current?city=Toronto",
"action_config": {
"method": "GET",
"headers": {
"Authorization": "Bearer sk-weather-api-key"
},
"response_path": "current.temp_f",
"tts_template": "The current temperature in Toronto is {value} degrees Fahrenheit"
},
"description": "Get weather for Toronto"
}
Process:
- Make HTTP request (GET/POST) with headers
- Parse JSON response
- Extract value using JSONPath (
current.temp_f) - Format message using template (
{value}placeholder) - Generate TTS for formatted message
- Play TTS audio to caller
JSONPath Examples:
current.temp_f→response["current"]["temp_f"]results[0].name→response["results"][0]["name"]message→response["message"]
5. hangup
Hangs up the call.
Example:
{
"digit": "9",
"action_type": "hangup",
"action_target": "NORMAL_CLEARING",
"description": "Hang up call"
}
FreeSWITCH Command:
hangup NORMAL_CLEARING
6. play_audio
Plays a pre-recorded audio file.
Example:
{
"digit": "5",
"action_type": "play_audio",
"action_target": "/var/audio/custom-greeting.wav",
"description": "Play custom greeting"
}
FreeSWITCH Command:
playback /var/audio/custom-greeting.wav
7. voicemail
Sends call to voicemail.
Example:
{
"digit": "0",
"action_type": "voicemail",
"action_target": "1001",
"description": "Send to voicemail for extension 1001"
}
FreeSWITCH Command:
voicemail default ${domain} 1001
Complete Example: Multi-Level IVR
Let's build a complete IVR with multiple menus, HTTP API calls, and queue transfers.
Step 1: Create IVR
curl -X POST http://api:8000/v1/ivrs \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"name": "support-hotline",
"description": "Technical support hotline with weather and queue routing",
"openai_api_key": "sk-your-openai-key",
"default_language": "en",
"default_voice": "nova",
"tts_model": "tts-1-hd"
}'
Step 2: Create Main Menu
curl -X POST http://api:8000/v1/ivrs/support-hotline/menus \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"menu_id": "main_menu",
"prompt_text": "Welcome to technical support. Press 1 for sales, press 2 for technical support, press 3 for weather information, or press 0 for operator.",
"timeout_seconds": 8,
"max_retries": 3,
"invalid_prompt": "Invalid selection. Please press a number from 0 to 3."
}'
Step 3: Add Main Menu Options
Option 1: Sales Sub-Menu
curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "1",
"action_type": "sub_menu",
"action_target": "sales_menu",
"description": "Navigate to sales menu"
}'
Option 2: Technical Support Queue
curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "2",
"action_type": "transfer_queue",
"action_target": "queue-tech-support",
"description": "Transfer to technical support queue"
}'
Option 3: Weather API Call
curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "3",
"action_type": "http_api_call",
"action_target": "https://api.weather.gov/gridpoints/TOP/31,80/forecast",
"action_config": {
"method": "GET",
"headers": {
"User-Agent": "Ominis-IVR/1.0"
},
"response_path": "properties.periods[0].detailedForecast",
"tts_template": "The weather forecast is: {value}"
},
"description": "Get weather forecast"
}'
Option 0: Operator Extension
curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/main_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "0",
"action_type": "transfer_extension",
"action_target": "1000",
"description": "Transfer to operator"
}'
Step 4: Create Sales Sub-Menu
curl -X POST http://api:8000/v1/ivrs/support-hotline/menus \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"menu_id": "sales_menu",
"prompt_text": "Sales department. Press 1 for phone sales, press 2 for computer sales, or press 9 to return to main menu.",
"timeout_seconds": 5,
"max_retries": 3
}'
Step 5: Add Sales Sub-Menu Options
Option 1: Phone Sales Queue
curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/sales_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "1",
"action_type": "transfer_queue",
"action_target": "queue-phone-sales",
"description": "Transfer to phone sales queue"
}'
Option 2: Computer Sales Queue
curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/sales_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "2",
"action_type": "transfer_queue",
"action_target": "queue-computer-sales",
"description": "Transfer to computer sales queue"
}'
Option 9: Return to Main Menu
curl -X POST http://api:8000/v1/ivrs/support-hotline/menus/sales_menu/options \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"digit": "9",
"action_type": "sub_menu",
"action_target": "main_menu",
"description": "Return to main menu"
}'
Step 6: Visualize IVR Flow
curl http://api:8000/v1/ivrs/support-hotline/flow \
-H "X-API-Key: your-api-key"
Response:
{
"ivr_name": "support-hotline",
"root_menu": {
"menu_id": "main_menu",
"prompt_text": "Welcome to technical support...",
"options": [
{"digit": "1", "action": "sub_menu", "target": "sales_menu"},
{"digit": "2", "action": "transfer_queue", "target": "queue-tech-support"},
{"digit": "3", "action": "http_api_call", "target": "https://api.weather.gov/..."},
{"digit": "0", "action": "transfer_extension", "target": "1000"}
],
"sub_menus": [
{
"menu_id": "sales_menu",
"prompt_text": "Sales department...",
"options": [
{"digit": "1", "action": "transfer_queue", "target": "queue-phone-sales"},
{"digit": "2", "action": "transfer_queue", "target": "queue-computer-sales"},
{"digit": "9", "action": "sub_menu", "target": "main_menu"}
],
"sub_menus": []
}
]
}
}
Step 7: Monitor TTS Cache
curl http://api:8000/v1/ivrs/support-hotline/tts/cache \
-H "X-API-Key: your-api-key"
Response:
{
"cache_entries": [
{
"text_hash": "abc123...",
"text": "Welcome to technical support. Press 1 for sales...",
"voice": "nova",
"language": "en",
"model": "tts-1-hd",
"file_path": "/var/ivr-audio/abc123.mp3",
"file_size_bytes": 24560,
"access_count": 127,
"last_accessed_at": "2025-10-14T15:23:10Z"
},
{
"text_hash": "def456...",
"text": "Sales department. Press 1 for phone sales...",
"voice": "nova",
"language": "en",
"model": "tts-1-hd",
"file_path": "/var/ivr-audio/def456.mp3",
"file_size_bytes": 18920,
"access_count": 43,
"last_accessed_at": "2025-10-14T15:18:45Z"
}
],
"total": 2,
"total_size_bytes": 43480
}
ADRs (Architectural Decision Records)
ADR: ESL Socket vs mod_xml_curl
Status: Accepted
Context: We need a mechanism for FreeSWITCH to execute IVR logic (menu navigation, DTMF handling, action routing). Two primary approaches exist:
- mod_xml_curl: FreeSWITCH fetches dialplan XML via HTTP on each call
- ESL Outbound Socket: FreeSWITCH connects to external socket server for call control
Decision: Use ESL Outbound Socket
Rationale:
| Factor | mod_xml_curl | ESL Outbound Socket | Winner |
|---|---|---|---|
| Control | Limited to dialplan apps | Full call control via commands | ✅ ESL |
| State Management | Stateless (HTTP request per step) | Stateful (persistent connection) | ✅ ESL |
| Error Handling | HTTP timeout/retry logic | Connection-based (immediate detection) | ✅ ESL |
| Complexity | Generate XML dialplan strings | Simple command messages | ✅ ESL |
| Performance | HTTP overhead per action | Single TCP connection | ✅ ESL |
| Native Apps | Can use all FreeSWITCH apps | Can use all FreeSWITCH apps | Tie |
| Debugging | HTTP logs + XML parsing | Clear command/response logs | ✅ ESL |
Key Advantage: Native FreeSWITCH Apps
ESL allows us to use play_and_get_digits, which handles:
- DTMF collection
- Automatic retries on invalid input
- Timeout handling
- Invalid prompt playback
Without play_and_get_digits (mod_xml_curl approach):
<!-- Complex state machine in dialplan XML -->
<extension name="menu_retry_1">
<condition field="${ivr_digit}" expression="^$">
<action application="playback" data="invalid.mp3"/>
<action application="set" data="retry_count=1"/>
<action application="transfer" data="menu_prompt"/>
</condition>
</extension>
<!-- Repeat for retry 2, 3... -->
With play_and_get_digits (ESL approach):
# One line - FreeSWITCH handles all retries!
await esl.execute_app(
"play_and_get_digits",
"1 1 3 5000 # prompt.mp3 invalid.mp3 ivr_digit \\d+"
)
Consequences:
- Positive: Simpler code, better error handling, native app usage
- Negative: Requires ESL socket server (added component)
- Mitigation: Socket server runs in same pod as FreeSWITCH (no network overhead)
ADR: OpenAI TTS vs Alternatives
Status: Accepted
Context: IVR system requires high-quality text-to-speech for menu prompts. Options evaluated:
- OpenAI TTS (cloud API)
- Google Cloud TTS (cloud API)
- AWS Polly (cloud API)
- Festival (open-source, on-premise)
- MaryTTS (open-source, on-premise)
- Coqui TTS (open-source, on-premise)
Decision: Use OpenAI TTS with SHA256 caching
Comparison:
| Solution | Quality | Cost (1M chars) | Latency | Voices | Languages | Deployment |
|---|---|---|---|---|---|---|
| OpenAI TTS | ⭐⭐⭐⭐⭐ | $15 | 500-2000ms | 6 | 50+ | Cloud |
| Google Cloud TTS | ⭐⭐⭐⭐⭐ | $16 | 400-1500ms | 400+ | 40+ | Cloud |
| AWS Polly | ⭐⭐⭐⭐ | $16 | 500-2000ms | 60+ | 30+ | Cloud |
| Festival | ⭐⭐ | Free | 50-200ms | 3 | 15 | On-premise |
| MaryTTS | ⭐⭐⭐ | Free | 200-500ms | 10 | 6 | On-premise |
| Coqui TTS | ⭐⭐⭐⭐ | Free | 500-3000ms | Custom | 50+ | On-premise |
Why OpenAI TTS?
- Quality: Neural voices nearly indistinguishable from human speech
- Simplicity: Simple REST API, no infrastructure to manage
- Cost: With caching, cost approaches zero after initial generation
- Developer Experience: Easy to test, debug, and iterate
Caching Strategy: SHA256-Based Deduplication
cache_key = SHA256(text + voice + language + model)
Cache Hit Rate Analysis:
- IVR with 10 unique prompts
- Each prompt played 100 times/day
- Cache hit rate: 99% (990/1000 requests served from cache)
- API calls: 10 (first generation only)
- Cost: $0.15 (vs $15 without cache)
Why Not On-Premise TTS?
| Factor | On-Premise (Coqui) | OpenAI TTS | Winner |
|---|---|---|---|
| Setup Time | 2-3 days (model training) | 5 minutes | ✅ OpenAI |
| Infrastructure | GPU pod required | None | ✅ OpenAI |
| Maintenance | Model updates, debugging | None | ✅ OpenAI |
| Quality | Good (but requires tuning) | Excellent (out-of-box) | ✅ OpenAI |
| Cost @ 1M chars | ~$50/mo (GPU pod) | $15 (API) | ✅ OpenAI |
| Cost @ 100M chars | ~$50/mo (same) | $1500 (API) | ✅ On-Premise |
Break-even Point: ~3M characters/month
Decision: Start with OpenAI TTS. Revisit if usage exceeds 3M chars/month.
Consequences:
- Positive: High quality, low maintenance, fast development
- Negative: External API dependency, recurring cost at scale
- Mitigation: SHA256 caching reduces API calls by 99%
Troubleshooting
IVR Pod Not Starting
Symptoms:
- Pod stuck in
PendingorCrashLoopBackOff - IVR status shows
pendingorerror
Diagnosis:
# Check pod status
kubectl get pods -n client-demo -l ivr-name=customer-support
# View pod events
kubectl describe pod -n client-demo -l ivr-name=customer-support
# Check pod logs
kubectl logs -n client-demo -l ivr-name=customer-support -c freeswitch-ivr
Common Issues:
- Missing OpenAI API Key
Error: OpenAI API key not configured
Fix: Update IVR with valid OpenAI API key
-
Database Connection Failed
Error: could not connect to server: Connection refusedFix: Check
DB_DSNenvironment variable and PostgreSQL accessibility -
Image Pull Error
Failed to pull image "registry.com/freeswitch-ivr:latest"Fix: Ensure image exists and
ghcr-secretis configured
TTS Generation Failing
Symptoms:
- Caller hears silence instead of prompts
- Logs show "OpenAI TTS API error"
Diagnosis:
# Check ESL socket handler logs
kubectl logs -n client-demo -l ivr-name=customer-support -c freeswitch-ivr | grep -i tts
# Test OpenAI API key manually
curl -X POST http://api:8000/v1/ivrs/customer-support/tts/preview \
-H "X-API-Key: your-key" \
-d '{"text": "Test prompt", "voice": "alloy"}'
Common Issues:
- Invalid API Key
OpenAI TTS API error 401: Incorrect API key provided
Fix: Update IVR with valid OpenAI API key
- Rate Limit Exceeded
OpenAI TTS API error 429: Rate limit exceeded
Fix: Check OpenAI usage limits or upgrade plan
- Text Too Long
ValueError: Text too long: 5000 chars (max 4096)
Fix: Split long prompts into multiple menus
Menu Navigation Not Working
Symptoms:
- DTMF digits not recognized
- Call hangs up after prompt
Diagnosis:
# Check ESL socket handler logs
kubectl logs -n client-demo -l ivr-name=customer-support | grep -i "digit\|menu"
# Check FreeSWITCH logs
kubectl logs -n client-demo -l ivr-name=customer-support | grep -i "play_and_get_digits"
Common Issues:
- Menu Options Not Configured
Error: Menu not found: main_menu
Fix: Create menu and options via API
- Invalid Digit Mapping
Warning: Invalid digit: 5
Fix: Ensure menu option exists for pressed digit
- Timeout Too Short
Info: No digit collected, hanging up
Fix: Increase timeout_seconds in menu config
HTTP API Call Failing
Symptoms:
- Caller hears "service unavailable" message
- Logs show HTTP error
Diagnosis:
# Check ESL socket handler logs
kubectl logs -n client-demo -l ivr-name=customer-support | grep -i "http_api_call"
Common Issues:
-
Invalid URL
Error: API call failed: 404 - Not FoundFix: Verify
action_targetURL is correct -
Missing Headers
Error: API call failed: 401 - UnauthorizedFix: Add
Authorizationheader toaction_config -
Invalid JSONPath
Warning: Could not extract JSON path: current.temp_fFix: Verify
response_pathmatches API response structure
Performance Considerations
TTS Cache Sizing
Estimation:
- Average prompt: 50 characters → ~5KB MP3
- 100 unique prompts → 500KB total
- 1000 unique prompts → 5MB total
Recommendations:
- Small IVRs (< 20 prompts): emptyDir volume (no persistence)
- Large IVRs (> 100 prompts): PersistentVolumeClaim (shared across restarts)
Cache Cleanup:
# Clear cache older than 30 days
curl -X DELETE http://api:8000/v1/ivrs/customer-support/tts/cache?older_than_days=30 \
-H "X-API-Key: your-key"
Pod Resource Tuning
Default Resources:
- Memory: 512Mi request, 1Gi limit
- CPU: 250m request, 500m limit
Increase for High-Traffic IVRs:
resources:
requests:
memory: 1Gi
cpu: 500m
limits:
memory: 2Gi
cpu: 1000m
Monitor Usage:
kubectl top pod -n client-demo -l ivr-name=customer-support
Security Best Practices
OpenAI API Key Storage
Do:
- ✅ Store API keys encrypted in database
- ✅ Use separate API keys per IVR (blast radius)
- ✅ Rotate API keys regularly (quarterly)
- ✅ Monitor API usage via OpenAI dashboard
Don't:
- ❌ Hard-code API keys in code
- ❌ Share API keys across all IVRs
- ❌ Expose API keys in logs
HTTP API Call Security
Do:
- ✅ Use HTTPS endpoints only
- ✅ Validate SSL certificates
- ✅ Set request timeout (default: 10s)
- ✅ Sanitize user input before API calls
Don't:
- ❌ Call untrusted HTTP endpoints
- ❌ Expose sensitive data in TTS prompts
- ❌ Trust API responses without validation
Related Documentation
- Queue Management - One-pod-per-queue pattern (IVR mirrors this)
- Extension Management - SIP extension creation (IVRs get extensions)
- Telephony Call Control - XML-RPC commands for call control
- Database Schema - Full schema including IVR tables
Summary
The IVR system provides a powerful, flexible framework for automated call handling with:
- ✅ 18 REST endpoints for complete IVR management
- ✅ One-pod-per-IVR architecture for isolation and scalability
- ✅ OpenAI TTS with SHA256-based caching (99% cost reduction)
- ✅ ESL socket handler for native FreeSWITCH app usage
- ✅ 7 action types (transfer, sub-menu, HTTP API, hangup, etc.)
- ✅ Multi-level menus with recursive navigation
- ✅ PostgreSQL backend for configuration and caching
- ✅ Kubernetes native with health checks and resource limits
The system balances developer experience (simple API, high-quality TTS) with operational efficiency (caching, resource optimization) to deliver production-ready IVR capabilities.