Paper2Code is a Claude Code skill that transforms any ArXiv paper into a runnable, citation-anchored Python implementation. Every line of generated code traces back to the exact paper section it implements, and any detail the paper skips is explicitly flagged β never silently invented.
| *Source: GitHub - PrathamLearnsToCode/paper2code | MCP Market Listing | Authorβs X Post* |
The Problem: Paper Reproduction is Painful
Anyone whoβs tried to reproduce a research paper knows the pain: key hyperparameters are buried in appendices or omitted entirely. You spend hours βguessingβ what the authors actually did. Traditional LLM code generation makes this worse by confidently filling in the gaps without telling you.
Paper2Code solves this with a core philosophy of honesty over completeness.
Three Core Mechanisms
| Mechanism | What It Does | Example |
|---|---|---|
| Citation Anchoring | Every code line references its paper section | # Β§3.2, Eq. 2 β softmax(QK^T / βd_k) |
| Ambiguity Auditing | Classifies each detail as specified / partial / unspecified | [UNSPECIFIED] Paper omits epsilon for LayerNorm |
| Transparent Defaults | Uses reasonable defaults but marks them clearly | eps=1e-6 # [UNSPECIFIED] Alternatives: 1e-5, 1e-8 |
Citation Anchoring in Action
# Β§3.2 β "We apply layer normalization before each sub-layer"
class TransformerBlock(nn.Module):
def forward(self, x):
# Β§3.2, Eq. 2 β attention_weights = softmax(QK^T / sqrt(d_k))
attn_out = self.attention(self.norm1(x))
x = x + attn_out # Β§3.2 β residual connection
Ambiguity Audit Labels
[SPECIFIED]β Paper defines this explicitly[PARTIALLY_SPECIFIED]β Paper is ambiguous; quote and reasoning included[UNSPECIFIED]β Paper omits this; code uses reasonable default with alternatives listed[ASSUMPTION]β Inferred from context with explanation[FROM_OFFICIAL_CODE]β Taken from authorsβ reference implementation
Installation & Usage
Install as a Claude Code skill via npx:
npx skills add PrathamLearnsToCode/paper2code/skills/paper2code
Then use with a simple slash command:
# Basic β just an ArXiv URL or ID
/paper2code https://arxiv.org/abs/1706.03762
/paper2code 1706.03762
# Specify framework
/paper2code https://arxiv.org/abs/2006.11239 --framework jax
# Full mode β includes training and data pipeline
/paper2code 2106.09685 --mode full
# Educational mode β extra comments, pedagogical notebook
/paper2code https://arxiv.org/abs/2010.11929 --mode educational
Generated Project Structure
{paper_slug}/
βββ README.md # Paper summary + quick-start
βββ REPRODUCTION_NOTES.md # Full ambiguity audit
βββ requirements.txt # Pinned dependencies
βββ src/
β βββ model.py # Architecture (Β§3.2 cited)
β βββ loss.py # Loss functions (Eq. refs)
β βββ train.py # Training loop (Β§4.1 cited)
β βββ data.py # Dataset skeleton
β βββ evaluate.py # Metrics
β βββ utils.py # Shared utilities
βββ configs/
β βββ base.yaml # All hyperparams (cited or flagged)
βββ notebooks/
βββ walkthrough.ipynb # CPU-runnable pedagogical notebook
The walkthrough.ipynb is especially useful: it maps βpaper paragraph β corresponding code β shape checkβ in a closed loop, letting you verify each piece incrementally.
Pipeline Under the Hood
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Paper Fetch β βββΆ β Parsing β βββΆ β Ambiguity β βββΆ β Code β βββΆ β Walkthrough β
β (ArXiv URL) β β (sections, β β Audit β β Generation β β Notebook β
β β β equations) β β β β β β β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
What Paper2Code Wonβt Do
- Guarantee correctness β it faithfully implements what the paper says, even if the paper is wrong
- Silently invent details β unspecified choices are always flagged
- Download datasets β provides skeleton data loaders only
- Reimplement standard components β if the paper says βstandard transformer encoder,β it imports rather than rewrites
Who Should Use This
- Researchers verifying whether a paperβs claims hold up in code
- Algorithm engineers reproducing SOTA methods for their own projects
- Students learning how papers translate into implementations
- Reviewers checking if a paperβs described method is internally consistent
How LearnAI Team Could Use This
- Paper-to-code labs β have students generate implementations, then audit which details were specified versus inferred.
- Research reproducibility demos β compare generated code against official repositories to teach implementation gaps.
- Critical reading practice β use ambiguity labels to show where papers leave out operational details.
- Course project scaffolding β help students bootstrap runnable baselines from assigned ArXiv papers.
Real-World Use Cases
- Research engineers β quickly turn papers into inspectable prototype implementations.
- ML teams β evaluate whether a new method is worth deeper reproduction work.
- Peer reviewers β check whether a method description is complete enough to implement.
- Graduate students β learn how equations, architecture descriptions, and hyperparameters map into code.
Related: PaperCoder (Academic Research)
A separate academic project called PaperCoder (arXiv 2504.17192) also tackles paper-to-code generation using a multi-agent framework with planning, analysis, and generation stages. It achieves strong results on the PaperBench benchmark. While different from this Claude Code skill, both address the same fundamental reproducibility challenge.