MIT Professor's NotebookLM Grading System — 47 Essays in 12 Minutes

MIT Professor's NotebookLM Grading System — 47 Essays in 12 Minutes

An MIT professor accidentally leaked his AI grading system during a Zoom call — forgot to turn off screen share. The class watched him grade 47 essays in 12 minutes using Google’s NotebookLM. What took 6 hours before now takes 15 minutes, and students get better feedback than his handwritten comments ever provided. The post went viral, but the real story is the community debate it sparked about what education is actually for.

*Source: X post by @ihtesham2005 Threads: @ihtesham.ai 爱可可-爱生活 Weibo analysis*

The Three-Step System

Step 1: Evaluate Against Rubric

Upload all student papers + the original rubric into NotebookLM. Prompt:

“Evaluate each paper against these specific criteria and flag any that deviate from expected patterns.”

NotebookLM compares each essay against the rubric systematically — something a human does inconsistently after the 15th paper when fatigue sets in.

Step 2: Cross-Reference for Academic Integrity

“Cross-reference writing styles with previous submissions and highlight potential academic integrity concerns.”

The AI caught three cases of writing style shifts the professor would have missed — students whose current submission didn’t match the voice and patterns of their previous work. This kind of cross-temporal pattern recognition across dozens of students is nearly impossible for humans.

Step 3: Generate Personalized Feedback

“Generate personalized feedback that connects each student’s weak points to specific course materials they should review.”

This is the step that matters most. Instead of generic margin notes, each student gets feedback that points to specific course materials they should revisit. The feedback is more actionable than handwritten comments because it connects weaknesses to learning resources.

Why NotebookLM Specifically?

NotebookLM has a critical property that makes it suitable for grading: it only outputs based on materials you feed it. It doesn’t fabricate from general knowledge — it works exclusively with the documents in its notebook. In an academic context, this means:

Feature Why It Matters for Grading
Source-grounded output Won’t invent criteria not in the rubric
Citation-backed claims Every assessment traces to specific text
No hallucination from training data Won’t confuse one student’s work with another’s
Consistent application Same rubric, same rigor, paper 1 through paper 47

The Education Paradox

The viral post triggered a deeper debate in the Chinese education community:

“Students use AI to write. Professors use AI to grade. AI detects AI. What’s left for humans?”

The responses broke into camps:

Position Argument
“This is the end of education” If AI writes and AI grades, the learning loop is broken
“This is the return of education” Grading was always torture, not teaching. Now professors can focus on actual mentoring
“Professors are hypocrites” Telling students not to use AI while secretly using it themselves
“Professors are being honest” Using AI as a tool is different from outsourcing thinking. The real lesson is how to collaborate with AI

The most thoughtful response: “Education was never about grading. It’s about teaching students to know what they don’t know. AI is actually good at that.”

The Deeper Insight

A math professor in the comments shared a revealing story: a colleague across the hall tried grading papers with AI and “had a breakdown” — because giving genuine feedback to students who write genuinely is one thing, but the repetitive labor of grading identical mediocre submissions is soul-crushing. The tool didn’t devalue teaching — it removed the part that was already broken.

The practical principle:

“The core of education was never about changing the assignment. It’s about helping students see what they’re missing. AI happens to be good at exactly that.”

How LearnAI Team Could Use This

  • Turn long-form student submissions, reflections, or project writeups into rubric-grounded feedback drafts for instructors to review.
  • Build a repeatable feedback workflow for LearnAI workshops: upload rubric, exemplars, course materials, and learner work, then ask NotebookLM to identify gaps.
  • Use the workflow internally to evaluate LearnAI documentation drafts against editorial standards before publication.

Real-World Use Cases

  • Instructors grading essay batches while preserving human review of final marks.
  • Teaching assistants generating first-pass feedback tied to a course rubric.
  • Bootcamp teams reviewing project reflections and recommending targeted remediation.
  • Curriculum designers checking whether assignments produce evidence of intended learning outcomes.

Practical Setup

For educators who want to try this:

  1. Create a NotebookLM notebook for each course
  2. Upload: syllabus, rubric, learning objectives, past exemplar papers
  3. Upload student submissions as a batch
  4. Run the three prompts in sequence (evaluate → cross-reference → feedback)
  5. Review AI output before distributing — always human-verify grades
  6. Use saved time for office hours, mentoring, and the parts of teaching that actually matter