ElatoAI puts realtime voice AI on an ESP32 chip β supporting 100+ voice AI models with sub-2-second latency globally. Itβs an end-to-end solution for building AI toys, voice assistants, and IoT devices without dealing with hardware compatibility issues, audio processing complexity, or multi-provider integration. ~1.6k GitHub stars, MIT licensed.
Source: GitHub - akdeb/ElatoAI
How It Works
User speaks
β
ESP32 captures + Opus compresses audio
β
WebSocket β Edge function (Deno/Cloudflare)
β
LLM API processes (OpenAI/Gemini/Grok/ElevenLabs/Hume)
β
Audio response generated
β
ESP32 decompresses β Speaker playback
β
< 2 second round-trip
Three-Layer Architecture
| Layer | Technology | Role |
|---|---|---|
| Frontend | Next.js on Vercel | Create agents, manage devices |
| Edge | Deno Edge / Cloudflare Workers | WebSocket connections, LLM API calls |
| IoT Client | ESP32-S3 + PlatformIO/Arduino | Audio capture, processing, playback |
Supported Providers
Via Deno Edge:
- OpenAI Realtime API
- Google Gemini Live API
- xAI Grok Voice Agent API
- ElevenLabs Conversational AI
- Hume AI EVI-4
Via Cloudflare Workers:
- 80+ LLM models
- 10+ TTS models
- 5 STT models
Key Features
- No PSRAM required β runs on standard ESP32-S3
- Button + capacitive touch control
- WiFi management via captive portal
- OTA firmware updates β update devices remotely
- Conversation history stored in Supabase
- Custom AI agents with personalized voices and tool-calling
- Opus compression β 12kbps at 24kHz sampling
Performance
| Metric | Value |
|---|---|
| Round-trip latency | < 2 seconds globally |
| Continuous conversation | Tested up to 17+ minutes |
| Cold start | 3-4 seconds |
| Audio quality | Opus 12kbps / 24kHz |
How LearnAI Team Could Use This
- Educational IoT projects β build voice-powered learning assistants for CS courses
- Hardware + AI integration β demonstrates full-stack IoT architecture (firmware β edge β cloud)
- Research prototype β rapid prototyping of voice-based research tools
- Teaching edge computing β concrete example of edge functions handling real-time AI workloads
Real-World Use Cases
- AI toys β childrenβs interactive voice companions
- Voice assistants β custom home assistants without cloud vendor lock-in
- Accessibility devices β voice-controlled tools for users with motor disabilities
- Smart home β voice-activated IoT controllers with custom AI personalities