Claude Code Token Costs: The Hidden Tax and How rtk Cuts It by 80%

Claude Code Token Costs: The Hidden Tax and How rtk Cuts It by 80%

Every bash command Claude Code runs has a hidden token cost. git status outputs ~3,000 tokens. cargo test can output 25,000 tokens. A typical medium project development cycle burns through 118,000 tokens just on command output β€” and with a 1M context window, that cost compounds silently until your daily quota vanishes.

*Source: Reddit β€” β€œSaying hey cost me 22% of my usage limits” GitHub β€” rtk-ai/rtk Kilo-Org Discussion β€” Saved 10M tokens (89%)*

The Problem: Token Billing is a Black Box

A user tracked GitHub issue #16157 and found a striking case: 92% of tokens in a session came from cache reads, actual output tokens were near zero, yet the API charged $1.50 β€” which counted as $65 of usage against their quota.

Why This Happens

The 1M context window is an amplifier. Key pain points:

Issue What Happens
Cache reads count Every message includes full conversation history β€” cached tokens still count toward usage
Silent retries When Claude encounters service instability, it silently retries requests, each retry re-reads the full context
Overnight sessions One long session can eat your entire daily quota by morning
No transparency Same operation might use 20% today, 89% tomorrow β€” no warning, no predictability

Quick Workarounds

  • /compact β€” Compress context before it gets too large; don’t wake old sessions, start fresh
  • /cost or /stats β€” Monitor token consumption in real-time
  • Don’t revive stale sessions β€” Opening yesterday’s conversation reloads the entire history

The Solution: rtk β€” CLI Proxy That Saves 80% of Tokens

rtk (Rust Token Killer) is a single Rust binary that sits between Claude Code and your shell commands. It intercepts command output, compresses it, and sends Claude only the essential information.

Before rtk:           After rtk:
git status β†’ 3000 tokens    β†’ 150 tokens (95% saved)
cargo test β†’ 25000 tokens   β†’ 2500 tokens (90% saved)
ls -la     β†’ 800 tokens     β†’ 150 tokens (82% saved)
─────────────────────────────────────────────────
Total: 118,000 tokens  β†’  23,900 tokens (80% saved)

How rtk Works: 4-Step Compression

Raw command output
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Smart Filtering   β”‚  Remove comments, whitespace, boilerplate
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2. Grouping          β”‚  Cluster similar items (e.g., files by directory)
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 3. Truncation        β”‚  Keep useful info, cut the rest
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 4. Deduplication     β”‚  Fold repeated log lines into one + count
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
Compressed output β†’ Claude

Setup: One Command

# Install on macOS
brew install rtk

# Then initialize globally
rtk init --global

After this, git, cargo/npm test, docker, kubectl, eslint/ruff/pylint, jest/vitest/playwright, pip/pnpm β€” all common dev commands are automatically intercepted. Claude sees only compressed versions, completely transparently.

Real Savings: Why It Matters Beyond Cost

Token savings aren’t just about money:

  • More context space β€” Compressed outputs mean more room for code history, more context for better Agent decisions
  • Longer sessions β€” Same conversation can stay productive longer before hitting limits
  • Better agent performance β€” Less noise in context = more focused, accurate responses

Limitations

Works With Doesn’t Work With
All Bash commands (git, npm, cargo, docker, kubectl, etc.) Claude Code’s built-in tools (Read, Grep, Glob) β€” these bypass the shell
Custom shell commands Completely unknown/custom commands β€” rtk passes these through raw

Workaround for built-in tools: Use shell equivalents explicitly β€” cat/head/tail instead of Read, rg/grep instead of Grep β€” when you want rtk compression. Or explicitly call rtk read.

Supported Platforms

Claude Code, OpenCode, Gemini CLI β€” install is one command, config lives at ~/.config/rtk/config.toml.

The Other Side: Cut Output Tokens with claude-token-efficient

rtk cuts input tokens (command output β†’ Claude). But Claude also wastes tokens on output β€” filler phrases, flattery, restating your question, unsolicited suggestions. claude-token-efficient is a CLAUDE.md snippet that eliminates this waste.

*Source: 陆三金 on Weibo (2026-03-31) 爱可可-ηˆ±η”Ÿζ΄» on Weibo (2026-04)*

The Problem: Default Claude Output Is Bloated

By default, Claude:

Waste Pattern Example
Opens with filler β€œSure!”, β€œGreat question!”, β€œAbsolutely!”
Restates your question β€œYou’re asking about how to…”
Ends with fluff β€œI hope this helps! Let me know if you need anything!”
Uses decorative formatting Em dashes, smart quotes, Unicode characters
Adds unsolicited suggestions β€œYou might also want to consider…”
Over-engineers Adds abstractions you didn’t ask for
Agrees with wrong statements β€œYou’re absolutely right!” (even when you’re not)

All of this wastes tokens. None of it adds value.

The Fix: One File in Your Project

Drop the snippet into your CLAUDE.md. Claude immediately becomes more concise β€” fewer tokens per response, faster output, cleaner parsing.

# Add to any project
curl -O https://raw.githubusercontent.com/drona23/claude-token-efficient/main/CLAUDE.md
# Or just add the rules to your existing CLAUDE.md

When to Use vs When Not To

Good For Not Good For
Automated pipelines (1000+ calls/day) Single ad-hoc queries (CLAUDE.md loading cost dominates)
Structured output for agent parsing Exploratory/architectural discussions
Code generation workflows When you need Claude to push back or explain tradeoffs
Repetitive tasks When you need detailed explanations
Teams with multi-turn automation Complex architecture design

Combined Strategy: rtk + claude-token-efficient

Input tokens:   rtk compresses command output      β†’ 80% saved
Output tokens:  claude-token-efficient cuts filler  β†’ 30-50% saved
─────────────────────────────────────────────────────────────
Combined:       Both sides optimized                β†’ significant cost + quota savings

Use both together for automated pipelines. For interactive work, rtk alone is usually enough β€” you probably want Claude’s explanations when you’re learning.

How LearnAI Team Could Use This

  • Teach token cost literacy as part of AI engineering practice.
  • Use rtk examples to show how tool output affects agent performance and budget.
  • Build classroom exercises where students compare raw versus compressed command output.

Real-World Use Cases

  • Reducing token burn during test-heavy coding sessions.
  • Keeping long Claude Code sessions usable on medium or large repositories.
  • Teaching teams to monitor and compress noisy CLI output before it enters agent context.