Claude Code Token Costs: The Hidden Tax and How rtk Cuts It by 80%

Every bash command Claude Code runs has a hidden token cost. git status outputs ~3,000 tokens. cargo test can output 25,000 tokens. A typical medium project development cycle burns through 118,000 tokens just on command output — and with a 1M context window, that cost compounds silently until your daily quota vanishes.

The Problem: Token Billing is a Black Box

A user tracked GitHub issue #16157 and found a striking case: 92% of tokens in a session came from cache reads, actual output tokens were near zero, yet the API charged $1.50 — which counted as $65 of usage against their quota.

Why This Happens

The 1M context window is an amplifier. Key pain points:

Issue	What Happens
Cache reads count	Every message includes full conversation history — cached tokens still count toward usage
Silent retries	When Claude encounters service instability, it silently retries requests, each retry re-reads the full context
Overnight sessions	One long session can eat your entire daily quota by morning
No transparency	Same operation might use 20% today, 89% tomorrow — no warning, no predictability

Quick Workarounds

/compact — Compress context before it gets too large; don’t wake old sessions, start fresh
/cost or /stats — Monitor token consumption in real-time
Don’t revive stale sessions — Opening yesterday’s conversation reloads the entire history

The Solution: rtk — CLI Proxy That Saves 80% of Tokens

rtk (Rust Token Killer) is a single Rust binary that sits between Claude Code and your shell commands. It intercepts command output, compresses it, and sends Claude only the essential information.

Before rtk:           After rtk:
git status → 3000 tokens    → 150 tokens (95% saved)
cargo test → 25000 tokens   → 2500 tokens (90% saved)
ls -la     → 800 tokens     → 150 tokens (82% saved)
─────────────────────────────────────────────────
Total: 118,000 tokens  →  23,900 tokens (80% saved)

How rtk Works: 4-Step Compression

Raw command output
       │
       ▼
┌─────────────────────┐
│ 1. Smart Filtering   │  Remove comments, whitespace, boilerplate
├─────────────────────┤
│ 2. Grouping          │  Cluster similar items (e.g., files by directory)
├─────────────────────┤
│ 3. Truncation        │  Keep useful info, cut the rest
├─────────────────────┤
│ 4. Deduplication     │  Fold repeated log lines into one + count
└─────────────────────┘
       │
       ▼
Compressed output → Claude

Setup: One Command

# Install on macOS
brew install rtk

# Then initialize globally
rtk init --global

After this, git, cargo/npm test, docker, kubectl, eslint/ruff/pylint, jest/vitest/playwright, pip/pnpm — all common dev commands are automatically intercepted. Claude sees only compressed versions, completely transparently.

Real Savings: Why It Matters Beyond Cost

Token savings aren’t just about money:

More context space — Compressed outputs mean more room for code history, more context for better Agent decisions
Longer sessions — Same conversation can stay productive longer before hitting limits
Better agent performance — Less noise in context = more focused, accurate responses

Limitations

Works With	Doesn’t Work With
All Bash commands (git, npm, cargo, docker, kubectl, etc.)	Claude Code’s built-in tools (Read, Grep, Glob) — these bypass the shell
Custom shell commands	Completely unknown/custom commands — rtk passes these through raw

Workaround for built-in tools: Use shell equivalents explicitly — cat/head/tail instead of Read, rg/grep instead of Grep — when you want rtk compression. Or explicitly call rtk read.

Supported Platforms

Claude Code, OpenCode, Gemini CLI — install is one command, config lives at ~/.config/rtk/config.toml.

The Other Side: Cut Output Tokens with claude-token-efficient

rtk cuts input tokens (command output → Claude). But Claude also wastes tokens on output — filler phrases, flattery, restating your question, unsolicited suggestions. claude-token-efficient is a CLAUDE.md snippet that eliminates this waste.

*Source: 陆三金 on Weibo (2026-03-31)

爱可可-爱生活 on Weibo (2026-04)*

The Problem: Default Claude Output Is Bloated

By default, Claude:

Waste Pattern	Example
Opens with filler	“Sure!”, “Great question!”, “Absolutely!”
Restates your question	“You’re asking about how to…”
Ends with fluff	“I hope this helps! Let me know if you need anything!”
Uses decorative formatting	Em dashes, smart quotes, Unicode characters
Adds unsolicited suggestions	“You might also want to consider…”
Over-engineers	Adds abstractions you didn’t ask for
Agrees with wrong statements	“You’re absolutely right!” (even when you’re not)

All of this wastes tokens. None of it adds value.

The Fix: One File in Your Project

Drop the snippet into your CLAUDE.md. Claude immediately becomes more concise — fewer tokens per response, faster output, cleaner parsing.

# Add to any project
curl -O https://raw.githubusercontent.com/drona23/claude-token-efficient/main/CLAUDE.md
# Or just add the rules to your existing CLAUDE.md

When to Use vs When Not To

Good For	Not Good For
Automated pipelines (1000+ calls/day)	Single ad-hoc queries (CLAUDE.md loading cost dominates)
Structured output for agent parsing	Exploratory/architectural discussions
Code generation workflows	When you need Claude to push back or explain tradeoffs
Repetitive tasks	When you need detailed explanations
Teams with multi-turn automation	Complex architecture design

Combined Strategy: rtk + claude-token-efficient

Input tokens:   rtk compresses command output      → 80% saved
Output tokens:  claude-token-efficient cuts filler  → 30-50% saved
─────────────────────────────────────────────────────────────
Combined:       Both sides optimized                → significant cost + quota savings

Use both together for automated pipelines. For interactive work, rtk alone is usually enough — you probably want Claude’s explanations when you’re learning.

How LearnAI Team Could Use This

Teach token cost literacy as part of AI engineering practice.
Use rtk examples to show how tool output affects agent performance and budget.
Build classroom exercises where students compare raw versus compressed command output.

Real-World Use Cases

Reducing token burn during test-heavy coding sessions.
Keeping long Claude Code sessions usable on medium or large repositories.
Teaching teams to monitor and compress noisy CLI output before it enters agent context.