A phone-based Voice AI Assistant built with Twilio/Asterisk + Node.js behind Nginx and running alongside your PHP apps (Laravel/Symfony/WordPress/Drupal) spins up fast. For SMB and mid-market, the security profile is risky: prompts/flows, call recordings and transcripts, and operational chains often live with third parties. That increases insider/leak exposure and creates vendor lock-in. Below is a safer direction and why PostgreSQL is outpacing MySQL in this context.
🧩 Architecture in 30 seconds
AWS (EC2/ALB) → Nginx (reverse-proxy) → Node.js (telephony + AI routing) + PHP backends (Laravel/Symfony/WordPress/Drupal). LLMs commonly via OpenAI / Azure OpenAI / Claude / AWS Bedrock. Primary DB: PostgreSQL (ideally PostGIS + pgvector); MySQL is fine for classic CRUD, but lags for geo/time windows and retrieval workloads.
⚠️ Where the security risks show up
• Prompts and orchestration chains (your know-how) and verbose logging often end up outside your perimeter → higher risk of insider misuse and scenario cloning.
• Calls and PII flow through external STT/TTS/LLM providers; retention and deletion controls aren’t truly yours.
• Vendor lock-in: changing LLM/telephony later can break chains, adapters, and tooling.
🧭 AI is rewriting the playbook
The last 30 years of web favored content/blogging/e-commerce. With AI, geolocation + schedules for people/companies/resources take the front seat, and real-time orchestration becomes the default. That shift rewards PostgreSQL: PostGIS for coordinates/routes and pgvector for RAG/semantic retrieval. Even in PHP backends, this improves routing of calls/dispatch, time-window logic, and decision speed.
🏢 Why SMB and mid-market lean on on-prem
Many teams still trust local racks/Windows-style setups because the data path is short and auditable. When prompts/data stream to outside LLMs (OpenAI, Gemini, Grok, etc.), orgs inevitably consider a self-hosted / open-source route under Apache-2.0 and similar licenses—keeping prompts, vector indexes, and logs inside the perimeter. Configs and secrets live on the company’s own servers with clear RBAC and audit.
🛡️ A safer path (sketch)
• Self-hosted LLM behind the firewall; RAG on PostgreSQL + pgvector.
• On-prem telephony: Asterisk/FreeSWITCH; record → transcribe → store locally (PostgreSQL).
• Nginx as a shield: TLS/mTLS, rate-limits/WAF, redact PII before anything leaves.
• Strict RBAC, minimal LLM logging, auditable access.
• Abstracted adapters for LLM/telephony from day one to avoid hard lock-in.
🚧 If you must stay cloud-first for now
• Keep prompts/chains and vector indexes inside your perimeter; send only minimal context outward.
• Redact PII in Node.js middleware and at Nginx.
• Plan your exit: interchangeable adapters for LLM/STT/TTS so a provider swap doesn’t break production.
🔎 What a phone Voice AI Assistant actually is (with real-time)
Think of it as an LLM core (ChatGPT variants like GPT-4o/4.1), an orchestrator (LangChain, LangGraph), speech-to-text (STT) engines (Whisper, Vosk), and text-to-speech (TTS) engines (Coqui TTS, Piper) working together in real-time to handle the call and drive actions across your systems.
📄 Appendix — drop in your Node.js skeleton (no secrets here)
import Fastify from 'fastify';
import WebSocket from 'ws';
import dotenv from 'dotenv';
import fastifyFormBody from '@fastify/formbody';
import fastifyWs from '@fastify/websocket';
// Load environment variables from .env file
dotenv.config({ path: '/var/www/twilio.solarneutrino.com/.env' });
// Retrieve the OpenAI API key from environment variables.
const { OPENAI_API_KEY } = process.env;
if (!OPENAI_API_KEY) {
console.error('Missing OpenAI API key. Please set it in the .env file.');
process.exit(1);
}
// Initialize Fastify
const fastify = Fastify();
fastify.register(fastifyFormBody);
fastify.register(fastifyWs);
// Constants
const SYSTEM_MESSAGE = 'Your name is Sophia. You are female. Your company, Solar Neutrino, provides marketing, SEO, and web development services. Answer any question and any language th
at you are asking. Try to make booking a schedule for your services. Use professional sellers scripts to get offers.';
const VOICE = 'sage'; //https://platform.openai.com/docs/guides/text-to-speech
const PORT = process.env.PORT || 5050; // Allow dynamic port assignment
// List of Event Types to log to the console. See the OpenAI Realtime API Documentation: https://platform.openai.com/docs/api-reference/realtime
const LOG_EVENT_TYPES = [
'error',
'response.content.done',
'rate_limits.updated',
'response.done',
'input_audio_buffer.committed',
'input_audio_buffer.speech_stopped',
'input_audio_buffer.speech_started',
'session.created'
];
// Show AI response elapsed timing calculations
const SHOW_TIMING_MATH = false;
// Root Route
fastify.get('/', async (request, reply) => {
reply.send({ message: 'Twilio Media Stream Server is running!' });
});
// Route for Twilio to handle incoming calls
//
fastify.all('/incoming-call', async (request, reply) => {
const twimlResponse = `
reply.type('text/xml').send(twimlResponse);
});
// WebSocket route for media-stream
fastify.register(async (fastify) => {
fastify.get('/media-stream', { websocket: true }, (connection, req) => {
console.log('Client connected');
// Connection-specific state
let streamSid = null;
let latestMediaTimestamp = 0;
let lastAssistantItem = null;
let markQueue = [];
let responseStartTimestampTwilio = null;
const openAiWs = new WebSocket('wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01', {
headers: {
Authorization: `Bearer ${OPENAI_API_KEY}`,
"OpenAI-Beta": "realtime=v1"
}
});
// Control initial session with OpenAI
const initializeSession = () => {
const sessionUpdate = {
type: 'session.update',
session: {
turn_detection: { type: 'server_vad' },
input_audio_format: 'g711_ulaw',
output_audio_format: 'g711_ulaw',
voice: VOICE,
instructions: SYSTEM_MESSAGE,
modalities: ["text", "audio"],
temperature: 0.8,
}
};
console.log('Sending session update:', JSON.stringify(sessionUpdate));
openAiWs.send(JSON.stringify(sessionUpdate));
// Uncomment the following line to have AI speak first:
// sendInitialConversationItem();
};
// Send initial conversation item if AI talks first
const sendInitialConversationItem = () => {
const initialConversationItem = {
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [
{
type: 'input_text',
text: 'Greet the user with "Hello there! I am an AI voice assistant powered by Twilio and the OpenAI Realtime API. You can
ask me for facts, jokes, or anything you can imagine. How can I help you?"'
}
] }
};
if (SHOW_TIMING_MATH) console.log('Sending initial conversation item:', JSON.stringify(initialConversationItem));
openAiWs.send(JSON.stringify(initialConversationItem));
openAiWs.send(JSON.stringify({ type: 'response.create' }));
};
// Handle interruption when the caller's speech starts
const handleSpeechStartedEvent = () => {
if (markQueue.length > 0 && responseStartTimestampTwilio != null) {
const elapsedTime = latestMediaTimestamp - responseStartTimestampTwilio;
if (SHOW_TIMING_MATH) console.log(`Calculating elapsed time for truncation: ${latestMediaTimestamp} - ${responseStartTimestampTwilio}
= ${elapsedTime}ms`);
if (lastAssistantItem) {
const truncateEvent = {
type: 'conversation.item.truncate',
item_id: lastAssistantItem,
content_index: 0,
audio_end_ms: elapsedTime
};
if (SHOW_TIMING_MATH) console.log('Sending truncation event:', JSON.stringify(truncateEvent));
openAiWs.send(JSON.stringify(truncateEvent));
}
connection.send(JSON.stringify({
event: 'clear',
streamSid: streamSid
}));
// Reset
markQueue = [];
lastAssistantItem = null;
responseStartTimestampTwilio = null;
}
};
// Send mark messages to Media Streams so we know if and when AI response playback is finished
const sendMark = (connection, streamSid) => {
if (streamSid) {
const markEvent = {
event: 'mark',
streamSid: streamSid,
mark: { name: 'responsePart' }
};
connection.send(JSON.stringify(markEvent));
markQueue.push('responsePart');
}
};
// Open event for OpenAI WebSocket
openAiWs.on('open', () => {
console.log('Connected to the OpenAI Realtime API');
setTimeout(initializeSession, 100);
});
// Listen for messages from the OpenAI WebSocket (and send to Twilio if necessary)
openAiWs.on('message', (data) => {
try {
const response = JSON.parse(data);
if (LOG_EVENT_TYPES.includes(response.type)) {
console.log(`Received event: ${response.type}`, response);
}
if (response.type === 'response.audio.delta' && response.delta) {
const audioDelta = {
event: 'media',
streamSid: streamSid,
media: { payload: response.delta }
};
connection.send(JSON.stringify(audioDelta));
// First delta from a new response starts the elapsed time counter
if (!responseStartTimestampTwilio) {
responseStartTimestampTwilio = latestMediaTimestamp;
if (SHOW_TIMING_MATH) console.log(`Setting start timestamp for new response: ${responseStartTimestampTwilio}ms`);
}
if (response.item_id) {
lastAssistantItem = response.item_id;
}
sendMark(connection, streamSid);
}
if (response.type === 'input_audio_buffer.speech_started') {
handleSpeechStartedEvent();
}
} catch (error) {
console.error('Error processing OpenAI message:', error, 'Raw message:', data);
}
});
// Handle incoming messages from Twilio
connection.on('message', (message) => {
try {
const data = JSON.parse(message);
switch (data.event) {
case 'media':
latestMediaTimestamp = data.media.timestamp;
if (SHOW_TIMING_MATH) console.log(`Received media message with timestamp: ${latestMediaTimestamp}ms`);
if (openAiWs.readyState === WebSocket.OPEN) {
const audioAppend = {
type: 'input_audio_buffer.append',
audio: data.media.payload
};
openAiWs.send(JSON.stringify(audioAppend));
}
break;
case 'start':
streamSid = data.start.streamSid;
console.log('Incoming stream has started', streamSid);
// Reset start and media timestamp on a new stream
responseStartTimestampTwilio = null;
latestMediaTimestamp = 0;
break;
case 'mark':
if (markQueue.length > 0) {
markQueue.shift();
}
break;
default:
console.log('Received non-media event:', data.event);
break;
}
} catch (error) {
console.error('Error parsing message:', error, 'Message:', message);
}
});
// Handle connection close
connection.on('close', () => {
if (openAiWs.readyState === WebSocket.OPEN) openAiWs.close();
console.log('Client disconnected.');
});
// Handle WebSocket close and errors
openAiWs.on('close', () => {
console.log('Disconnected from the OpenAI Realtime API');
});
openAiWs.on('error', (error) => {
console.error('Error in the OpenAI WebSocket:', error);
});
});
});
fastify.listen({ port: PORT }, (err) => {
if (err) {
console.error(err);
process.exit(1);
}
console.log(`Server is listening on port ${PORT}`);
});
Tags: security, Voice AI, phone AI assistant, real-time, Twilio, Asterisk, Node.js, Nginx, Laravel, Symfony, WordPress, Drupal, OpenAI API, Azure OpenAI, Anthropic Claude, AWS Bedrock, RAG, LangChain, LangGraph, PostgreSQL, PostGIS, pgvector, STT (Whisper, Vosk), TTS (Coqui TTS, Piper), call recording, transcription, on-prem, self-hosted LLM
