ElatoAI β€” Realtime Voice AI on ESP32

ElatoAI β€” Realtime Voice AI on ESP32

ElatoAI puts realtime voice AI on an ESP32 chip β€” supporting 100+ voice AI models with sub-2-second latency globally. It’s an end-to-end solution for building AI toys, voice assistants, and IoT devices without dealing with hardware compatibility issues, audio processing complexity, or multi-provider integration. ~1.6k GitHub stars, MIT licensed.

Source: GitHub - akdeb/ElatoAI

How It Works

User speaks
    ↓
ESP32 captures + Opus compresses audio
    ↓
WebSocket β†’ Edge function (Deno/Cloudflare)
    ↓
LLM API processes (OpenAI/Gemini/Grok/ElevenLabs/Hume)
    ↓
Audio response generated
    ↓
ESP32 decompresses β†’ Speaker playback
    ↓
< 2 second round-trip

Three-Layer Architecture

Layer Technology Role
Frontend Next.js on Vercel Create agents, manage devices
Edge Deno Edge / Cloudflare Workers WebSocket connections, LLM API calls
IoT Client ESP32-S3 + PlatformIO/Arduino Audio capture, processing, playback

Supported Providers

Via Deno Edge:

  • OpenAI Realtime API
  • Google Gemini Live API
  • xAI Grok Voice Agent API
  • ElevenLabs Conversational AI
  • Hume AI EVI-4

Via Cloudflare Workers:

  • 80+ LLM models
  • 10+ TTS models
  • 5 STT models

Key Features

  • No PSRAM required β€” runs on standard ESP32-S3
  • Button + capacitive touch control
  • WiFi management via captive portal
  • OTA firmware updates β€” update devices remotely
  • Conversation history stored in Supabase
  • Custom AI agents with personalized voices and tool-calling
  • Opus compression β€” 12kbps at 24kHz sampling

Performance

Metric Value
Round-trip latency < 2 seconds globally
Continuous conversation Tested up to 17+ minutes
Cold start 3-4 seconds
Audio quality Opus 12kbps / 24kHz

How LearnAI Team Could Use This

  • Educational IoT projects β€” build voice-powered learning assistants for CS courses
  • Hardware + AI integration β€” demonstrates full-stack IoT architecture (firmware β†’ edge β†’ cloud)
  • Research prototype β€” rapid prototyping of voice-based research tools
  • Teaching edge computing β€” concrete example of edge functions handling real-time AI workloads

Real-World Use Cases

  • AI toys β€” children’s interactive voice companions
  • Voice assistants β€” custom home assistants without cloud vendor lock-in
  • Accessibility devices β€” voice-controlled tools for users with motor disabilities
  • Smart home β€” voice-activated IoT controllers with custom AI personalities