Youβve written a novel with AI. Now what? Toonflow takes the next step β itβs an open-source AI Agent workbench that converts novels into short dramas automatically. Text goes in, video comes out. No manual storyboarding, no frame-by-frame prompting. The AI handles character extraction, script writing, visual generation, and video synthesis in one pipeline.
| *Source: GitHub β HBAI-Ltd/Toonflow-app (6.4k stars, Apache 2.0) | Official Site* |
Why This Matters Beyond Video
If you care about AI Agent orchestration β how to coordinate multiple specialized AI models to complete a complex creative task β Toonflowβs architecture is worth studying:
| Challenge | How Most Tools Fail | How Toonflow Solves It |
|---|---|---|
| Multi-model coordination | Manual handoff between text/image/video models | Three-layer agent system auto-orchestrates |
| Character consistency | Characters look different in every frame | Structured character profiles + Nano Banana Pro face consistency |
| Story coherence | Each scene generated in isolation | Event Graph extraction preserves narrative structure |
| Vendor lock-in | Hardcoded to one provider | Vercel AI SDK + programmable vendor system |
The Pipeline
Novel Text (e.g., ζ’θ±δΉ¦ζΏ, 48,000 words)
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β [1] Event Extraction (LLM) β
β Characters β profiles (appearance, β
β personality, relationships) β
β Plot β structured event graph β
βββββββββββββββββββββββββββββββββββββββββββ€
β [2] Script Generation (ScriptAgent) β
β Dialogue + scene descriptions + β
β stage directions β
βββββββββββββββββββββββββββββββββββββββββββ€
β [3] Storyboard (LLM + Image Gen) β
β Visual prompts β AI frames β
β Camera angles, composition, props β
βββββββββββββββββββββββββββββββββββββββββββ€
β [4] Video Synthesis (Sora / Doubao) β
β 5-20 second clips per scene β
βββββββββββββββββββββββββββββββββββββββββββ€
β [5] Production Assembly β
β Editing + refinement + export β
βββββββββββββββββββββββββββββββββββββββββββ
Three-Layer Agent Architecture
This is the most interesting part from an engineering perspective:
- Decision Layer β Plans the overall production: how many scenes, what style, pacing decisions. Think of it as the βdirector.β
- Execution Layer β Runs individual tasks: write this script, generate this image, synthesize this clip. Think of it as the βcrew.β
- Supervision Layer β Quality checks: is the character consistent? Does the dialogue match the scene? Is the pacing right? Think of it as the βeditor.β
Each layer can use different models. The Decision layer benefits from a strong reasoning model (Claude, GPT-4o), while the Execution layer can use faster/cheaper models for bulk generation.
Persistent Agent Memory: Agents maintain context across sessions using local ONNX vector retrieval β similar to how WebNovel Writer uses RAG-over-chapters.
Supported AI Providers
| Role | Options |
|---|---|
| LLM (script/characters) | OpenAI, Claude, DeepSeek V3, Qwen, Zhipu, MiniMax, xAI |
| Image Generation | Nano Banana Pro (recommended for 4K + face consistency) |
| Video Generation | Sora (OpenAI) or Doubao (ByteDance) |
The Programmable Vendor System lets you write custom vendor logic in settings β no code changes needed to add new providers.
Installation
# Option 1: Docker (recommended)
git clone https://github.com/HBAI-Ltd/Toonflow-app.git
cd Toonflow-app
yarn docker:local
# β http://localhost:10588 | Login: admin / admin123
# Option 2: Desktop app
# Download from GitHub Releases (Windows/Mac/Linux)
# Option 3: Server deployment
yarn install && yarn build
pm2 start pm2.json
Cost Per Episode
| Component | Cost |
|---|---|
| LLM (script + character extraction) | $0.50-2.00 |
| Image generation (20-50 frames) | $1.00-5.00 |
| Video generation (20-50 clips) | $10.00-50.00 |
| Total per episode | ~$12-57 |
What This Teaches About AI Agent Systems
The same patterns apply to any multi-model AI pipeline:
- Separate orchestration from execution β The Decision layer doesnβt generate images; it decides what to generate. Same principle as harness engineering.
- Structured intermediate representations β Character profiles and event graphs are the βglueβ between pipeline stages. Without them, each stage operates blind.
- Vendor abstraction is critical β The Vercel AI SDK layer means swapping from Sora to Doubao is a config change, not a rewrite. Design for model portability.
- Persistent memory enables iteration β Without cross-session memory, every run starts from scratch. The ONNX vector store is lightweight but sufficient.
- Quality supervision must be automated β The Supervision layer catches inconsistencies that would otherwise require human review. This is the difference between βdemo qualityβ and βproduction quality.β
Case Study: ζ’θ±δΉ¦ζΏ β Short Drama (Completed)
We used Toonflow to convert our AI-written literary novella γζ’θ±δΉ¦ζΏγ (48,000 words, 12 chapters) into a 3-episode short drama with 16-frame storyboard. The novel was created entirely using the webnovel-writer Claude Code skill β making this an end-to-end AI creative pipeline: idea β novel β video.
Pipeline Results
| Step | Tool | Model | Result |
|---|---|---|---|
| Event extraction | Toonflow | Gemini 2.5 Flash | 3 chapters β characters + plot events |
| Script generation | Toonflow | GPT-4o | 3 episode scripts with scene breakdowns |
| Character design | baoyu-image-gen | Gemini 3 Pro Image | 2 character sheets from real photos |
| Storyboard art | baoyu-image-gen (batch) | Gemini 3 Pro Image | 16 frames, Chinese watercolor anime style |
| Director review | Toonflow Production Agent | GPT-4o | B+ rating, passed supervision |
Practical Lessons
- Gemini + Vercel AI SDK tool calling is fragile β Toonflowβs Script Agent failed repeatedly with Gemini due to TypeValidationError in streaming tool call responses. Switched to OpenAI for reliable tool calling.
- Toonflowβs DB has undocumented tables β
o_scriptAssets,o_assetsRole2Audio,memoriestables were missing and had to be created manually. The init script has a SQLite bug that silently skips table creation. - Bypass vendor image generation β Toonflowβs built-in image generation through vendor
imageRequestis unreliable with Google. Direct batch generation viabaoyu-image-genwas faster and more reliable (16 frames in ~5 minutes). - The three-layer agent system works β Decision/Execution/Supervision agents caught real issues (missing assets, pacing problems). The B+ score was earned, not inflated.
- Total cost: under $1 β $0.50 OpenAI (scripts) + $0 Google free tier (event extraction + images). Far cheaper than the $12-57/episode estimate.
3 Episodes Produced
- EP01: ε§θειοΌεΏε¨θθ½ (First meeting at the bookshop)
- EP02: ζ ε°ιζΈΈοΌζ ζ«ζΈζ΅ (Return visit, deepening feelings)
- EP03: ζ ε½ζεοΌηεΏθ§ι (Emotional awakening on Pingjiang Road)
How LearnAI Team Could Use This
- Study Toonflow as a reference architecture for multi-agent creative pipelines: decision, execution, and supervision layers.
- Use the novel-to-video workflow as a demo path for LearnAI content production.
- Turn the documented failure points into training material on tool-calling reliability and vendor abstraction.
Real-World Use Cases
- Convert AI-written webnovel chapters into short-drama pilots or Douyin/TikTok storyboard packages.
- Prototype visual treatments for IP development before hiring a full production team.
- Teach agent orchestration using a concrete media pipeline with structured intermediate outputs.
Links
- GitHub: HBAI-Ltd/Toonflow-app
- Official Site: toonflow.ai
- Tutorial: Bilibili 8-min quickstart
- Discord: Community
- Version: Latest release on GitHub