Writing an AI novel is only step one. The real leverage comes when a single Claude Code session takes 150 chapters of raw markdown and converts them into multi-voice audiobooks, vertical video, and published Douyin content — with you clicking nothing but “publish.” This entry documents the complete pipeline built during a live session with the 探花书房·逆袭 project.
| *Source: Hands-on case study (April 11-12, 2026) | Edge TTS | Playwright MCP* |
The Pipeline at a Glance
Chapter .md (raw text)
│
├──[1]──→ Parse: split narration vs dialogue, detect speakers
│
├──[2]──→ TTS: Edge TTS multi-voice (旁白/Qing/林晚/妈/爸)
│
├──[3]──→ Audiobook MP3: concatenate + 1.5x speed
│
├──[4]──→ Douyin Video: text frames + fade transitions (1080×1920)
│
└──[5]──→ Publish: Playwright fills form, you click "发布"
One command per step. No manual editing. No switching between apps.
Case Study: 探花书房·逆袭
| Metric | Result |
|---|---|
| Chapters written | 48 new chapters (103-150) in one session |
| Writing method | Parallel agents, 4 chapters at a time |
| Audiobook generated | 20 chapters, ~10 min each at 1.5x |
| Douyin videos generated | 20 full-chapter videos (~8-12 min each) |
| Douyin videos published | 20 chapters via semi-automated Playwright |
| Total time | ~4 hours (writing + audio + video + publishing) |
Step 1: Multi-Voice Audiobook with Edge TTS
Why Edge TTS?
| Option | Cost | Quality | Chinese Voices |
|---|---|---|---|
| Edge TTS | Free | 7/10 | 6 standard voices |
| CosyVoice (Alibaba) | Free (open-source) | 8.5/10 | Voice cloning |
| Fish Audio | Pay per use | 9/10 | Many character voices |
| 火山引擎 TTS | Free tier | 9/10 | Industry standard |
Edge TTS wins for prototyping — zero cost, zero setup, good enough quality to validate the concept before investing in premium voices.
Voice Configuration (The Key Decision)
Finding the right voice required iteration. We generated 15+ demo clips before landing on this configuration:
| Role | Edge TTS Voice | Rate | Pitch | Why This Works |
|---|---|---|---|---|
| Narrator | YunxiNeural | -5% | +0Hz | Warm, novel-reading tone |
| Qing (male lead) | YunyangNeural | -8% | -2Hz | Professional, calm — matches the character |
| 林晚 (female lead) | XiaoyiNeural | -10% | +0Hz | Clear, youthful with distance |
| Mom (50s) | XiaoxiaoNeural | -12% | -8Hz | Pitch down + slow = aged voice |
| Dad (50s) | YunjianNeural | -10% | -3Hz | Deep, tired, working-class |
Key insight: Edge TTS voices are all “young professional” by default. To age a voice, drop pitch 5-8Hz and slow rate 10-12%. To differentiate characters on the same base voice, vary both pitch AND rate — changing only one makes them sound like the same person at a different speed.
The Parsing Challenge
Chinese webnovel dialogue uses "" (curly quotes). Speaker detection uses a simple heuristic:
# Look 30 chars before the quote for character names
def detect_speaker(context_before):
for name, voice in CHAR_MAP.items():
if name in context_before[-30:]:
return voice
return 'narrator' # fallback
This works ~90% of the time for the novel’s writing style (speaker name almost always precedes dialogue). The 10% misattribution goes to narrator voice, which sounds natural enough.
Generation Pipeline
# Install
python3 -m venv /tmp/tts-env
/tmp/tts-env/bin/pip install edge-tts Pillow
# Generate chapters 1-20
/tmp/tts-env/bin/python batch_audiobook.py 1 20
Each chapter: parse → generate segments (8 parallel) → add silence gaps → concatenate → apply 1.5x speed → output MP3.
Output: ~10 minutes per chapter at 1.5x speed. 20 chapters in ~15 minutes.
Step 2: Douyin Vertical Video
Format Iteration (What We Learned)
| Version | Format | User Feedback |
|---|---|---|
| v1 | 1-min clips, static text cards, chapter split into 8-12 videos | “Too short, too many to publish” |
| v2 (final) | Full chapter (~10 min), text fade in/out, 1 video = 1 chapter | “Perfect, much less work” |
The lesson: Start simple, iterate fast. v1 took 30 minutes to build, revealed the wrong assumption (short = better for Douyin), and v2 took another 30 minutes to pivot.
Video Design
┌──────────────────────┐
│ 探花书房·逆袭 │ ← Faint watermark (opacity 20%)
│ │
│ │
│ 十月一号傍晚,苏州到 │
│ 太仓的末班大巴晃晃悠 │ ← White text, centered
│ 悠地停在了浏河路汽车 │ 44pt PingFang SC
│ 站。 │ Fades in 0.3s, out 0.3s
│ │
│ │
│ │
│ │
└──────────────────────┘
1080×1920 / near-black bg (#0F0F19)
Dialogue: slightly brighter bg (#161423)
Speaker name below dialogue text
Generation
# Generate full-chapter videos for chapters 1-20
/tmp/tts-env/bin/python douyin_fullchapter.py 1 20
Each chapter: parse → generate audio per segment → create text frame images (Pillow) → render video segments with fade filter (ffmpeg) → concatenate → apply 1.5x speed.
Output: ~9-12 MB per chapter. 20 chapters total ~255 MB.
Step 3: Semi-Automated Publishing via Playwright
Why Semi-Automated?
| Approach | Risk | Effort |
|---|---|---|
| Manual (browser) | Zero | High — fill 20 forms by hand |
| Semi-auto (Playwright fills, you publish) | Zero | Low — you just click one button |
| Full-auto (Playwright publishes) | Account ban risk | Zero |
Semi-auto is the sweet spot: Playwright handles the tedious form-filling, but you maintain control over when each video goes live.
How It Works
- First time: Playwright opens a browser, you scan QR code with Douyin app to log in
- Per chapter: Script uploads video, fills title + description + hashtags
- You: Review in the browser, click “发布”
// Per-chapter upload (runs in Playwright MCP)
async (page) => {
await page.goto('https://creator.douyin.com/creator-micro/content/upload');
await page.waitForTimeout(1500);
const fileInput = await page.locator('input[type="file"]').first();
await fileInput.setInputFiles('<VIDEO_PATH>');
await page.waitForTimeout(4000);
const titleInput = page.locator('input[placeholder*="标题"]').first();
await titleInput.fill('探花书房·逆袭|第X章 <Title>');
const editor = page.locator('.editor-kit-container').first();
await editor.click();
await page.keyboard.press('Meta+a');
await page.keyboard.type('<Hook description> #小说推荐 #有声小说 ...');
return 'ready';
}
Publishing speed: ~30 seconds per chapter (upload + fill + click). 20 chapters in ~10 minutes.
Description Strategy
Each chapter’s Douyin description follows a formula:
<One-line hook from chapter's most dramatic moment> + hashtags
Examples:
- “东北穷小子回到太仓,妈炖了排骨,爸的手指肿成紫色却不敢去医院…”
- “系统第一次过载。Qing的鼻子流出血来,视线模糊了三秒。”
- “她是谁?Qing在平江路的巷口再次遇到林晚,这次她没有笑。”
The hook should create curiosity without spoiling — make viewers start the video.
The Skill: /webnovel-publish
The entire pipeline is packaged as a Claude Code skill. In any future session:
/webnovel-publish # Triggers the full pipeline
"发抖音" # Also triggers it
"做有声书 21-50" # Generate audiobooks for chapters 21-50
"publish chapters 21-50" # Full pipeline for a chapter range
The skill contains:
- Voice configuration table
- Chapter parsing logic
batch_audiobook.py— audiobook generation scriptdouyin_fullchapter.py— video generation script- Playwright publishing workflow
- Description template
How LearnAI Team Could Use This
- Convert LearnAI tutorials or case studies into audio/video formats for mobile-first learners.
- Package repeatable publishing workflows as Claude Code skills for consistent media asset production.
- Prototype short-form course promotion clips from existing markdown lessons.
Real-World Use Cases
- Webnovel authors turning chapters into audiobooks and vertical videos.
- Educators repurposing lesson notes into narrated shorts or long-form mobile videos.
- Content teams batch-producing platform-specific posts from one canonical markdown source.
- Solo creators using semi-automated publishing while keeping manual approval before release.
What’s Next
| Enhancement | Difficulty | Impact |
|---|---|---|
| CosyVoice voice cloning | Medium | Much better character voices |
| AI-generated scene images | High | Visual interest for longer videos |
| Background music | Low | Atmosphere (need royalty-free source) |
| Auto-scheduling | Low | Timed releases instead of manual publish |
| Multi-platform | Medium | Publish to 小红书, 微信视频号 simultaneously |
Key Takeaways
- Edge TTS is underrated — free, zero-setup, 6 Chinese neural voices with pitch/rate control. Good enough for MVP audiobooks.
- Iterate format fast — our v1 (1-min clips) was wrong. We pivoted to full-chapter in 30 minutes. Ship, test, adjust.
- Semi-automation is the sweet spot — Playwright fills forms, you click publish. Zero ban risk, 95% effort reduction.
- One session, three outputs — the same chapter text becomes an MP3, a video, and a Douyin post. Write once, distribute everywhere.
- Parallel agents for writing — 4 chapters at a time with detailed outlines = 48 chapters in ~2 hours. The outline is the bottleneck, not the writing.