Three videos that build your intuition for what happens inside an LLM — no math required, just watch in order.
Start here. Modern LLM chatbots like ChatGPT, Claude, and Gemini generate responses by predicting the next token, one step at a time. This video makes that idea concrete and visual.
14:34 · Attention Visualized · 30K+ views
LLMs choose the most probable next token — that's it.
Output is shaped by patterns in training data, not direct access to facts.
ChatGPT streams because it literally computes one token per step.
Now that you know LLMs predict the next word, where do they learn those patterns? This video shows the pretraining process: how 15 trillion words get curated, cleaned, and fed to GPUs for months.
15:53 · Attention Visualized · 3.8K views
A model that covers up the next word and guesses — 15 trillion times.
100TB of raw web crawl gets filtered down to 6.8TB of clean training text.
Thousands of GPUs running for months. No teacher — just text.
The deepest dive. Self-attention is the core mechanism that lets every word in a sentence look at every other word simultaneously. This is what makes modern AI different from everything that came before.
13:01 · Visual AI · 19K+ views
Every word queries every other word to understand meaning in context.
Query, Key, Value — three projections that compute relevance between words.
Self-attention solved the long-range dependency problem that held AI back for years.
Now that you understand how LLMs work, see how engineers build real products on top of them. This Stanford course summary covers the full technology stack from base models to multi-agent systems.
27:24 · Gary Chen · 58K+ views · Based on Stanford's "Beyond LLM" course
No domain knowledge, outdated info, hard to control, struggles with long context.
A core skill for every AI user. Includes two usage styles: delegating whole tasks vs. collaborating step by step.
Retrieval-Augmented Generation — give the model access to your own documents.
Train the model on your data to specialize it for specific tasks.
LLMs that can plan, use tools, and take actions — not just answer questions.
Multiple AI agents collaborating on complex tasks, each with a specialized role.
Now that you understand the theory, try training a real AI model yourself. Google's Teachable Machine lets you build an image, sound, or pose classifier in minutes — no coding required. This short video shows how it works in a classroom setting.
5:50 · STEM Learning By Doing · Advanced / Optional
Collect examples, label them, and watch the model learn the patterns.
See where the model gets confused — that's how you build intuition for AI limits.
Go to teachablemachine.withgoogle.com and train your own classifier.