Type4Me & Voice-Input-Src: Local Voice Input Tools for macOS

Type4Me & Voice-Input-Src: Local Voice Input Tools for macOS

Voice input tools on the market are either expensive ($12/month), send your data to the cloud, or donโ€™t let you customize prompts. Type4Me solves all three: itโ€™s a free, local-first macOS voice input app built on SherpaOnnx โ€” no API key needed, no internet required, and with a unique โ€œprocessing modeโ€ that pipes speech through LLMs before output.

Source: GitHub โ€” joewongjc/type4me

How It Works

Voice Input (microphone)
       โ”‚
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ SherpaOnnx Engineโ”‚  โ† Local, no internet
โ”‚ (Apple Silicon   โ”‚
โ”‚  optimized)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚ Raw text
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Processing Mode  โ”‚  โ† Optional LLM post-processing
โ”‚ (Prompt-driven)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚ Refined text
         โ–ผ
    Paste / Output

Key Features

Local Speech Recognition

  • Built on SherpaOnnx (next-gen Kaldi + ONNX Runtime) โ€” runs entirely offline
  • Fast on Apple Silicon, no API key or internet required
  • Optional cloud engines (Volcengine, Deepgram) for higher accuracy
  • Plugin architecture: OpenAI Whisper, Google, AWS interfaces pre-defined

Processing Mode โ€” The Killer Feature

Speech recognition outputs raw text, which you can then route through LLM post-processing:

Built-in Mode What It Does
Quick Dictation Fast transcription, minimal processing
Dual-Channel High Precision Higher accuracy transcription
Chinese-English Bilingual recognition
Prompt Optimization Clean up grammar, punctuation, formatting
Custom Prompt Write your own processing pipeline

You can define any processing flow with custom prompts โ€” turn messy speech into formatted notes, translate on the fly, or extract action items from spoken paragraphs.

Command Mode

This is where it gets interesting:

  1. Select text in any app
  2. Press hotkey and speak
  3. Speech becomes an instruction, selected text becomes context
  4. LLM executes the instruction and outputs the result

Essentially turns speech into an LLM command line. Select a paragraph โ†’ say โ€œsummarize this in 3 bulletsโ€ โ†’ done.

Privacy-First Data

  • All credentials and recognition history stored locally (SQLite + JSON)
  • No telemetry, no cloud sync
  • History records support CSV export
  • MIT license

Requirements

  • macOS 14+
  • Apple Silicon recommended (Intel supported but slower)
  • No API key needed for local mode
  • Optional: API keys for cloud engines or LLM processing

Plugin Architecture

Adding new speech recognition services requires only two protocol implementations then registration. Interfaces for OpenAI Whisper, Google Speech, and AWS Transcribe are already pre-defined โ€” the community can contribute adapters.

Also Worth Knowing: Voice-Input-Src & Voice-Input-Dist

Another open-source Mac voice input project taking a different approach โ€” voice-input-src focuses on the prompt rather than the engine, and voice-input-dist is the fully working app generated from that prompt.

*Source: GitHub โ€” yetone/voice-input-src GitHub โ€” yetone/voice-input-dist ๅฎ็މ xp on Weibo (2026-03-29)*
ย  Type4Me Voice-Input-Src/Dist
Focus Full-featured voice input app Open-source prompt + generated app
Key value Local recognition + LLM processing modes The prompt design is the real IP โ€” reproducible by anyone
Vibe coding Supported via command mode Core use case โ€” one prompt โ†’ full macOS app
Engine SherpaOnnx (local) + cloud options Apple Speech Recognition (native)
LLM refinement Built-in processing modes Optional OpenAI-compatible API for mixed Chinese/English
License MIT Open source

The author (ๅฎ็މ xp) notes: โ€œWhatโ€™s open-sourced is the Prompt โ€” the code generated afterward has more value than a pile of vibe coding output, because you can reproduce it yourself.โ€

Voice-Input-Dist: The Generated App

The voice-input-dist repo (158 stars) is the complete macOS menu-bar app generated from a single Claude Code prompt:

claude \
  --dangerously-skip-permissions \
  --output-format=stream-json \
  --verbose \
  -p "่ฏทๅฎž็Žฐไธ€ไธช macOS menu-bar ่ฏญ้Ÿณ่พ“ๅ…ฅๆณ•ๅบ”็”จ (Swift, macOS 14+)"

The prompt specifies 7 detailed requirements:

  1. Fn key to record โ€” press and hold Fn, speech streams into the focused text field via Apple Speech Recognition
  2. Default zh-CN โ€” out-of-box Chinese recognition with language switcher (English, Chinese, Japanese, Korean, etc.)
  3. Floating waveform animation โ€” elegant borderless NSPanel with 5-bar RMS-driven waveform (44ร—32px) and live transcript label
  4. Clipboard injection โ€” text inserted via Cmd+V paste simulation, auto-detects CJK vs ASCII input method
  5. LLM refinement โ€” optional OpenAI-compatible API to improve accuracy for mixed Chinese/English text
  6. Settings UI โ€” LLM Refinement toggle, API Base URL, API Key configuration
  7. LSUIElement mode โ€” menu bar icon only, no Dock icon, built with Swift Package Manager
# Build and run
make build    # Creates VoiceInput.app
make run      # Build + launch
make install  # Copy to /Applications

A reproducibility guarantee: clone the source repo, run make build, get an identical app. The full build process is documented in a public asciinema recording.

How LearnAI Team Could Use This

  • Use Type4Me as a practical example of local-first AI tooling combining offline speech recognition with optional LLM cleanup.
  • Demonstrate voice-driven editing workflows for documentation, note-taking, and prompt iteration.
  • Compare Type4Me with Voice-Input-Src/Dist to teach building a product vs open-sourcing a reproducible prompt.

Real-World Use Cases

  • Dictate notes, drafts, and messages on macOS without sending raw audio to cloud services.
  • Select existing text and use voice commands to summarize, rewrite, translate, or format it through an LLM.
  • Build specialized voice-input workflows for bilingual writing, meeting notes, coding prompts, and documentation.