Awesome-Auto-Research-Tools (github.com/handsome-rich/Awesome-Auto-Research-Tools) is a bilingual (English + Chinese), CC0-licensed GitHub catalog of open-source projects that automate parts of the research lifecycle β from literature search and paper reading to idea generation, experiment execution, paper writing, and peer review. About 430 stars as of May 2026; entries are grouped into five categories, with a star-history chart tracking the major systems and a weekly star-count refresh workflow.
| *Source: github.com/handsome-rich/Awesome-Auto-Research-Tools (CC0 1.0) | Spotted via short-form video, May 2026* |
What this list is (and isnβt)
| It is | It isnβt |
|---|---|
| A catalog of repos organized by where in the research lifecycle they help | A benchmark or head-to-head comparison |
| Focused on automated research specifically β agents that run lit review, experiments, or writing | A general βawesome AI agentsβ list (those exist elsewhere) |
Bilingual β full English README.md plus README_CN.md for Chinese readers |
A paper / survey (it links to a few sibling lists, but is itself a list) |
Maintained with weekly star-count refresh via GitHub Actions (.github/workflows/update-stars.yml) β the script lives at scripts/update_stars.py, and CLAUDE.md documents the Claude-Code-driven maintenance pattern |
Fully auto-curated β the workflow only refreshes counts; humans still add/remove entries by PR. No broken-link sweep. |
| Public domain (CC0) β adapt freely | A hard-gated list β the inclusion bar (see below) has multiple signals, not one fixed cutoff |
This is the map. The wiki entries AI Research Tools Landscape and Autoresearch: 100 Autonomous ML Experiments Overnight are the deep-dives on specific tools the map points to.
The five categories
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π§ͺ End-to-End Autonomous Research Systems β
β Agents that run idea β plan β experiment β paper β
β e.g. AI-Scientist, AI-Scientist-v2, RD-Agent, β
β DeepScientist, InternAgent, Agent Laboratory, β
β AI-Researcher, Karpathy autoresearch, AIDE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π Deep Research & Literature Synthesis β
β Agents that read, summarize, and write reports β
β e.g. DeerFlow (ByteDance), STORM (Stanford), β
β GPT Researcher, ChatPaper, PaperQA2, OpenScholar, β
β Tongyi DeepResearch, Open Deep Research, DATAGEN β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βοΈ Automated Experiment & Code Agents β
β Agents that edit code and run experiments β
β e.g. AutoGPT, OpenHands, Aider, SWE-agent, β
β MLE-agent, claude-scholar β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π§ Research Skills & Plugin Collections β
β Reusable skill packs for Claude Code / agents β
β e.g. scientific-agent-skills, AI-Research-SKILLs, β
β ARIS, Idea2Paper β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π Awesome Lists & Surveys β
β Sibling catalogs (NOT survey papers) β
β e.g. awesome-ai-for-science, Autonomous-Agents, β
β Awesome-Deep-Research β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The names above are selective representatives β the list also includes Biomni, ChatReviewer, OpenResearcher, PaperBanana, DeepResearchAgent, Auto-Deep-Research, and others. Check the live repo for the full inventory; it changes as the maintainer accepts PRs.
Whatβs notable in each category
π§ͺ End-to-End β the closed-loop AI scientists
This is the category most worth your time if youβre trying to understand βAI scientistβ claims. Notable inclusions (descriptions paraphrased from the listed projectsβ own READMEs):
| Project | Distinctive angle |
|---|---|
| AI-Scientist / AI-Scientist-v2 (SakanaAI) | The reference fully-automated open-ended discovery system; v2 adds workshop-level agentic tree search |
| DeepScientist (ResearAI) | Local-first autonomous research studio. README highlights Findings Memory + Bayesian optimization for hypothesis selection |
| InternAgent (Alpha-Innovator / InternScience) | Closed-loop hypothesis β verification with deep-research + memory modules. Repo lists domain coverage: physics, biology, earth & life sciences |
| RD-Agent (Microsoft) | Industrial R&D loop focus |
| Agent Laboratory / AI-Researcher | Multi-agent research orchestration with a graduate-student-style workflow |
| autoresearch (Karpathy) | The pattern that named the category β see our deep-dive |
| AIDE | Kaggle/ML competition-focused experiment loop |
π Deep Research β the literature-synthesis agents
The category where most teams actually deploy today, in my reading:
- DeerFlow (ByteDance) β multi-agent deep research framework
- STORM (Stanford) β research outline β Wikipedia-style article
- GPT Researcher, PaperQA2, OpenScholar β long-form report / paper-QA
- Tongyi DeepResearch, Open Deep Research β open-source analogs of commercial Deep Research products
- ChatPaper β paper Q&A and summary
βοΈ Experiment / Code agents
The βcoding agentβ category that overlaps heavily with general dev tooling:
- AutoGPT, OpenHands (formerly OpenDevin), Aider, SWE-agent, MLE-agent
These show up in this list because researchers often re-purpose them as the executor inside an End-to-End system.
π§ Skills & plugin collections
Reusable skill packs you can install into your Claude Code / agent setup β scientific-agent-skills, AI-Research-SKILLs, ARIS, etc. Pairs naturally with Karpathy: Skills are the Big New Idea.
π Awesome lists β sibling catalogs
The list cross-references other awesome-style catalogs (awesome-ai-for-science, Autonomous-Agents, Awesome-Deep-Research). It does not list standalone academic survey papers directly. If you want the survey-paper layer, follow those sibling lists or the openags/Awesome-AI-Scientist-Papers repo (which this catalog does not currently link).
Quality signal β what the inclusion bar actually is
The repoβs signals are not fully consistent β worth knowing before you treat any single number as a hard cutoff:
| Source | What it says |
|---|---|
README.md |
500+ stars OR exceptionally notable / top-venue publication |
CLAUDE.md (maintenance notes) |
500+ stars |
scripts/update_stars.py |
STAR_THRESHOLD = 1000 β but the script only reports projects below threshold; it does not remove them |
So treat the inclusion bar as βroughly 500+ stars with publication exceptions,β not as an automated gate. The star-history chart in the README (the same widget youβd see on star-history.com) lets you eyeball whether a listed project is rising, plateauing, or abandoned.
How LearnAI Team Could Use This
1. Tool-scouting shortlist (a 2-4 hour audit, not 30 minutes). Before committing to a deep-research backbone for a LearnAI project, run this recipe instead of an afternoon of unstructured GitHub spelunking:
- Open the list, scroll to Category π (Deep Research).
- For each candidate, check the linked repoβs star-history slope as one triage signal β not a hard filter. Mature-but-useful tools can plateau; rising stars can be hype. Pair with: issue activity in the last 30 days, recent commits, and whether the README clearly states what input it takes and what output it produces.
- Pick the top 3 by community signal plus a quick fit-for-purpose read of the README.
- Run each on the same throwaway query (e.g., βSummarize the last 5 years of work on refinement types for security-relevant program analysis, 2024-2026β). Budget ~30-60 min per tool for accounts/API keys/first-run setup. Score on (a) source quality, (b) report coherence, (c) cost per run, (d) re-runnability.
- Pilot the strongest candidate for two weeks; keep the runner-up bookmarked.
2. Curriculum design β map categories to existing courses (with careful scoping). The 5-category map can be mapped onto specific Monmouth courses, but each pairing needs scoping:
- CS-310 (Advanced Object-Oriented Programming & Design) β Category βοΈ. Assign students to evaluate Aider on a small OO-design refactor in their semester project (Aider is the pair-programming CLI β best fit for βI have a focused change in mindβ). SWE-agent and OpenHands have different scope (issue-solving agent / autonomous coding platform) and fit better in upper-level SE or research courses.
- CS-336 (Program Analysis for Security) β Category π for security literature review (e.g., βSurvey program-analysis techniques for finding integer-overflow bugs, 2020-2026β), and a small Category π§ͺ exposure as a critique exercise β students read what an AI-scientist agent produces and identify the methodological flaws.
- BF422 (Investments) β Category π for sector/company literature review before a stock-pitch assignment (the agentβs job: pull 10-K filings, analyst notes, and macro context into a coherent brief; the studentβs job: verify, add their own thesis, fix what the agent missed).
- General grad seminar slide β the 5-category diagram in this entry can be lifted directly (CC0); add one representative tool per category and you have a 20-minute lecture.
3. Faculty research-tools menu β the elevator-pitch email. Save colleagues an hour of βwhat should I try?β by sending one paragraph: βThereβs a curated CC0 catalog at github.com/handsome-rich/Awesome-Auto-Research-Tools (~430 stars, bilingual). Five categories of automated-research tools, each with a one-line description and a star-history chart. Start in the Deep Research category if you want quick wins on literature work; the End-to-End category is where the AI-scientist hype lives β promising but still research-grade pilots, not production tools.β Reusable, factually anchored, and points to the map rather than guessing for them.
4. Companion to existing entries β a 3-link reading order for a new colleague. When someone joins LearnAI and wants to ramp up on βAI in researchβ in under an hour, give them:
- This entry (the map β see the lay of the land in ~10 min)
- AI Research Tools Landscape (the comparison β FARS vs AutoResearch vs ARIS vs Elicit, ~15 min)
- Autoresearch: 100 Autonomous ML Experiments Overnight (one concrete pattern in depth β ~20 min)
They finish with a vocabulary, a comparison framework, and one worked pattern. Enough to be useful in a planning meeting the next day.
Real-World Use Cases
| Scenario | Description |
|---|---|
| Picking a deep-research agent for a student team | Filter to Category π; weigh star-history alongside issue activity and README clarity, then pilot the top candidate for a few weeks |
| Adopting a research skill pack for a courseβs Claude Code setup | Browse Category π§. Inspect one candidate skill pack before installing β third-party skills may carry dependency, tool, or API-key assumptions. Install into ~/.claude/skills/ only after youβve read the SKILL.md |
| Tracking closed-loop AI-scientist systems | Watch Category π§ͺ β DeepScientist, InternAgent, AI-Scientist-v2, Agent Laboratory are recent active entries. Treat them as research-grade pilots, not course-ready defaults |
| Translating tooling memos for Chinese-speaking collaborators | The README_CN.md often saves you a first translation pass; still skim for tone/local-idiom drift |
| Adapting a published AI-scientist workflow to your own data | Pick a project from Category π§ͺ that has both a repo and an arXiv paper (e.g., AI-Scientist-v2). Run their workflow on your own dataset to adapt or extend their results β note this is adaptation, not strict replication (replication uses their benchmarks, configs, and seeds) |
| Onboarding a new research assistant | Pin 3 entries from this list as week-1 reading: one from π§ͺ (so they grasp the fieldβs ceiling), one from π (so they have a daily-driver tool), one from π§ (so they can extend their setup) |
A Concrete Walk-Through β This Semester at Monmouth
A 4-week plan to actually use this list (not just bookmark it). Plan for 6-10 hours total in the first run while you build setup, prompts, and rubrics; later iterations drop to ~3 hours once those exist.
| Week | Action | Expected output |
|---|---|---|
| Week 1 (~2-3 hr) | Pick one Category π tool. DeerFlow if you want a multi-agent research harness for complex queries; STORM if you want outline-driven, Wikipedia-style synthesis. Run it on a CS-336-relevant query β e.g., βSurvey program-analysis techniques for integer-overflow detection, 2020-2026β | One ~5-page synthesis report. Note: what citations did it miss? Where did it hallucinate? File this as evidence for whether the tool is course-ready |
| Week 2 (~2 hr) | Read the SKILL.md of one Category π§ skill pack (e.g., scientific-agent-skills or AI-Research-SKILLs) before installing. Confirm the dependencies, tool list, and any required API keys. Then drop it into ~/.claude/skills/ and run a sample task |
Working integrated skill in ~/.claude/skills/ (or a documented decision to skip β both are valid outcomes) |
| Week 3 (~1-2 hr) | Assign one CS-310 student (~1 hr of office-hours commitment) to evaluate Aider against a small OO-design refactor in their semester project. Same task theyβd code by hand | Side-by-side: agentβs diff vs. studentβs diff. A 5-minute classroom share-out the next week becomes a teaching moment about pair-programming-agent failure modes |
| Week 4 (~1 hr) | Write a 1-page reflection β what worked, what didnβt, would you do it again. Push to the LearnAI wiki under a new auto-research-trial-spring-2026 entry |
A reusable artifact for the next semester. The trial becomes data, not anecdote |
Compounding return: by the end of week 4 you have firsthand data on 3 categories of the list, a documented skill-pack decision, a student-facing case study, and a draft entry you can show colleagues.
Important things to know
- The list is human-curated, not auto-curated β the only automated piece is a weekly GitHub Action that refreshes star counts via
scripts/update_stars.py. Entries are added/removed by maintainer PRs. Treat it as a living document, not a static reference. - Inclusion bar is βroughly 500+ stars, with publication exceptionsβ β useful for filtering noise, but means promising small projects wonβt appear. Cross-reference with yibie/awesome-autoresearch and WecoAI/awesome-autoresearch if you want the long tail.
- CC0 license β public domain, no attribution required. Safe to embed the category map in your own course materials or wiki.
- Bilingual structure is a real feature β
README_CN.mdis not a partial translation; it tracks the English version closely. - Whatβs not here: no original benchmark table or head-to-head evaluation, no tutorials, no maintainer commentary beyond one-line descriptions. The list is a finder, not a comparator β you still need to evaluate the actual tools yourself.
- Repo claims to verify yourself β be aware that some descriptions paraphrase performance or setup claims from the linked projectsβ own READMEs. For example, DeepScientistβs README mentions both β10-minuteβ and β15-minuteβ local-setup phrasings; InternAgentβs three-subsystem framing (Generation / Verification / Evolution) appears in its associated paper, not the current GitHub README. Always read the source project before adopting.
- Companion deep-dives in this wiki:
- AI Research Tools Landscape: FARS vs AutoResearch vs ARIS vs Elicit β the comparison counterpart
- Autoresearch: 100 Autonomous ML Experiments Overnight β Karpathyβs pattern, the seed of Category π§ͺ
- AI-Assisted Research Workflow β the human-in-the-loop framing
- Karpathy: Skills are the Big New Idea β for understanding Category π§