Every bash command Claude Code runs has a hidden token cost. git status outputs ~3,000 tokens. cargo test can output 25,000 tokens. A typical medium project development cycle burns through 118,000 tokens just on command output β and with a 1M context window, that cost compounds silently until your daily quota vanishes.
| *Source: Reddit β βSaying hey cost me 22% of my usage limitsβ | GitHub β rtk-ai/rtk | Kilo-Org Discussion β Saved 10M tokens (89%)* |
The Problem: Token Billing is a Black Box
A user tracked GitHub issue #16157 and found a striking case: 92% of tokens in a session came from cache reads, actual output tokens were near zero, yet the API charged $1.50 β which counted as $65 of usage against their quota.
Why This Happens
The 1M context window is an amplifier. Key pain points:
| Issue | What Happens |
|---|---|
| Cache reads count | Every message includes full conversation history β cached tokens still count toward usage |
| Silent retries | When Claude encounters service instability, it silently retries requests, each retry re-reads the full context |
| Overnight sessions | One long session can eat your entire daily quota by morning |
| No transparency | Same operation might use 20% today, 89% tomorrow β no warning, no predictability |
Quick Workarounds
/compactβ Compress context before it gets too large; donβt wake old sessions, start fresh/costor/statsβ Monitor token consumption in real-time- Donβt revive stale sessions β Opening yesterdayβs conversation reloads the entire history
The Solution: rtk β CLI Proxy That Saves 80% of Tokens
rtk (Rust Token Killer) is a single Rust binary that sits between Claude Code and your shell commands. It intercepts command output, compresses it, and sends Claude only the essential information.
Before rtk: After rtk:
git status β 3000 tokens β 150 tokens (95% saved)
cargo test β 25000 tokens β 2500 tokens (90% saved)
ls -la β 800 tokens β 150 tokens (82% saved)
βββββββββββββββββββββββββββββββββββββββββββββββββ
Total: 118,000 tokens β 23,900 tokens (80% saved)
How rtk Works: 4-Step Compression
Raw command output
β
βΌ
βββββββββββββββββββββββ
β 1. Smart Filtering β Remove comments, whitespace, boilerplate
βββββββββββββββββββββββ€
β 2. Grouping β Cluster similar items (e.g., files by directory)
βββββββββββββββββββββββ€
β 3. Truncation β Keep useful info, cut the rest
βββββββββββββββββββββββ€
β 4. Deduplication β Fold repeated log lines into one + count
βββββββββββββββββββββββ
β
βΌ
Compressed output β Claude
Setup: One Command
# Install on macOS
brew install rtk
# Then initialize globally
rtk init --global
After this, git, cargo/npm test, docker, kubectl, eslint/ruff/pylint, jest/vitest/playwright, pip/pnpm β all common dev commands are automatically intercepted. Claude sees only compressed versions, completely transparently.
Real Savings: Why It Matters Beyond Cost
Token savings arenβt just about money:
- More context space β Compressed outputs mean more room for code history, more context for better Agent decisions
- Longer sessions β Same conversation can stay productive longer before hitting limits
- Better agent performance β Less noise in context = more focused, accurate responses
Limitations
| Works With | Doesnβt Work With |
|---|---|
| All Bash commands (git, npm, cargo, docker, kubectl, etc.) | Claude Codeβs built-in tools (Read, Grep, Glob) β these bypass the shell |
| Custom shell commands | Completely unknown/custom commands β rtk passes these through raw |
Workaround for built-in tools: Use shell equivalents explicitly β cat/head/tail instead of Read, rg/grep instead of Grep β when you want rtk compression. Or explicitly call rtk read.
Supported Platforms
Claude Code, OpenCode, Gemini CLI β install is one command, config lives at ~/.config/rtk/config.toml.
The Other Side: Cut Output Tokens with claude-token-efficient
rtk cuts input tokens (command output β Claude). But Claude also wastes tokens on output β filler phrases, flattery, restating your question, unsolicited suggestions. claude-token-efficient is a CLAUDE.md snippet that eliminates this waste.
| *Source: ιδΈι on Weibo (2026-03-31) | η±ε―ε―-η±ηζ΄» on Weibo (2026-04)* |
The Problem: Default Claude Output Is Bloated
By default, Claude:
| Waste Pattern | Example |
|---|---|
| Opens with filler | βSure!β, βGreat question!β, βAbsolutely!β |
| Restates your question | βYouβre asking about how toβ¦β |
| Ends with fluff | βI hope this helps! Let me know if you need anything!β |
| Uses decorative formatting | Em dashes, smart quotes, Unicode characters |
| Adds unsolicited suggestions | βYou might also want to considerβ¦β |
| Over-engineers | Adds abstractions you didnβt ask for |
| Agrees with wrong statements | βYouβre absolutely right!β (even when youβre not) |
All of this wastes tokens. None of it adds value.
The Fix: One File in Your Project
Drop the snippet into your CLAUDE.md. Claude immediately becomes more concise β fewer tokens per response, faster output, cleaner parsing.
# Add to any project
curl -O https://raw.githubusercontent.com/drona23/claude-token-efficient/main/CLAUDE.md
# Or just add the rules to your existing CLAUDE.md
When to Use vs When Not To
| Good For | Not Good For |
|---|---|
| Automated pipelines (1000+ calls/day) | Single ad-hoc queries (CLAUDE.md loading cost dominates) |
| Structured output for agent parsing | Exploratory/architectural discussions |
| Code generation workflows | When you need Claude to push back or explain tradeoffs |
| Repetitive tasks | When you need detailed explanations |
| Teams with multi-turn automation | Complex architecture design |
Combined Strategy: rtk + claude-token-efficient
Input tokens: rtk compresses command output β 80% saved
Output tokens: claude-token-efficient cuts filler β 30-50% saved
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Combined: Both sides optimized β significant cost + quota savings
Use both together for automated pipelines. For interactive work, rtk alone is usually enough β you probably want Claudeβs explanations when youβre learning.
How LearnAI Team Could Use This
- Teach token cost literacy as part of AI engineering practice.
- Use rtk examples to show how tool output affects agent performance and budget.
- Build classroom exercises where students compare raw versus compressed command output.
Real-World Use Cases
- Reducing token burn during test-heavy coding sessions.
- Keeping long Claude Code sessions usable on medium or large repositories.
- Teaching teams to monitor and compress noisy CLI output before it enters agent context.