Paper2Code: Turn ArXiv Papers into Citation-Anchored Code

Paper2Code is a Claude Code skill that transforms any ArXiv paper into a runnable, citation-anchored Python implementation. Every line of generated code traces back to the exact paper section it implements, and any detail the paper skips is explicitly flagged — never silently invented.

*Source: GitHub - PrathamLearnsToCode/paper2code

MCP Market Listing

Author’s X Post*

The Problem: Paper Reproduction is Painful

Anyone who’s tried to reproduce a research paper knows the pain: key hyperparameters are buried in appendices or omitted entirely. You spend hours “guessing” what the authors actually did. Traditional LLM code generation makes this worse by confidently filling in the gaps without telling you.

Paper2Code solves this with a core philosophy of honesty over completeness.

Three Core Mechanisms

Mechanism	What It Does	Example
Citation Anchoring	Every code line references its paper section	`# §3.2, Eq. 2 — softmax(QK^T / √d_k)`
Ambiguity Auditing	Classifies each detail as specified / partial / unspecified	`[UNSPECIFIED] Paper omits epsilon for LayerNorm`
Transparent Defaults	Uses reasonable defaults but marks them clearly	`eps=1e-6 # [UNSPECIFIED] Alternatives: 1e-5, 1e-8`

Citation Anchoring in Action

# §3.2 — "We apply layer normalization before each sub-layer"
class TransformerBlock(nn.Module):
    def forward(self, x):
        # §3.2, Eq. 2 — attention_weights = softmax(QK^T / sqrt(d_k))
        attn_out = self.attention(self.norm1(x))
        x = x + attn_out  # §3.2 — residual connection

Ambiguity Audit Labels

[SPECIFIED] — Paper defines this explicitly
[PARTIALLY_SPECIFIED] — Paper is ambiguous; quote and reasoning included
[UNSPECIFIED] — Paper omits this; code uses reasonable default with alternatives listed
[ASSUMPTION] — Inferred from context with explanation
[FROM_OFFICIAL_CODE] — Taken from authors’ reference implementation

Installation & Usage

Install as a Claude Code skill via npx:

npx skills add PrathamLearnsToCode/paper2code/skills/paper2code

Then use with a simple slash command:

# Basic — just an ArXiv URL or ID
/paper2code https://arxiv.org/abs/1706.03762
/paper2code 1706.03762

# Specify framework
/paper2code https://arxiv.org/abs/2006.11239 --framework jax

# Full mode — includes training and data pipeline
/paper2code 2106.09685 --mode full

# Educational mode — extra comments, pedagogical notebook
/paper2code https://arxiv.org/abs/2010.11929 --mode educational

Generated Project Structure

{paper_slug}/
├── README.md                  # Paper summary + quick-start
├── REPRODUCTION_NOTES.md      # Full ambiguity audit
├── requirements.txt           # Pinned dependencies
├── src/
│   ├── model.py              # Architecture  (§3.2 cited)
│   ├── loss.py               # Loss functions (Eq. refs)
│   ├── train.py              # Training loop  (§4.1 cited)
│   ├── data.py               # Dataset skeleton
│   ├── evaluate.py           # Metrics
│   └── utils.py              # Shared utilities
├── configs/
│   └── base.yaml             # All hyperparams (cited or flagged)
└── notebooks/
    └── walkthrough.ipynb     # CPU-runnable pedagogical notebook

The walkthrough.ipynb is especially useful: it maps “paper paragraph → corresponding code → shape check” in a closed loop, letting you verify each piece incrementally.

Pipeline Under the Hood

┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Paper Fetch │ ──▶ │   Parsing    │ ──▶ │  Ambiguity   │ ──▶ │    Code      │ ──▶ │ Walkthrough  │
│  (ArXiv URL) │     │  (sections,  │     │    Audit     │     │  Generation  │     │  Notebook    │
│              │     │  equations)  │     │              │     │              │     │              │
└──────────────┘     └──────────────┘     └──────────────┘     └──────────────┘     └──────────────┘

What Paper2Code Won’t Do

Guarantee correctness — it faithfully implements what the paper says, even if the paper is wrong
Silently invent details — unspecified choices are always flagged
Download datasets — provides skeleton data loaders only
Reimplement standard components — if the paper says “standard transformer encoder,” it imports rather than rewrites

Who Should Use This

Researchers verifying whether a paper’s claims hold up in code
Algorithm engineers reproducing SOTA methods for their own projects
Students learning how papers translate into implementations
Reviewers checking if a paper’s described method is internally consistent

How LearnAI Team Could Use This

Paper-to-code labs — have students generate implementations, then audit which details were specified versus inferred.
Research reproducibility demos — compare generated code against official repositories to teach implementation gaps.
Critical reading practice — use ambiguity labels to show where papers leave out operational details.
Course project scaffolding — help students bootstrap runnable baselines from assigned ArXiv papers.

Real-World Use Cases

Research engineers — quickly turn papers into inspectable prototype implementations.
ML teams — evaluate whether a new method is worth deeper reproduction work.
Peer reviewers — check whether a method description is complete enough to implement.
Graduate students — learn how equations, architecture descriptions, and hyperparameters map into code.

A separate academic project called PaperCoder (arXiv 2504.17192) also tackles paper-to-code generation using a multi-agent framework with planning, analysis, and generation stages. It achieves strong results on the PaperBench benchmark. While different from this Claude Code skill, both address the same fundamental reproducibility challenge.