A practical logistics guide for you, Rolf, and the crossover RCT
This is the "how do we actually do this" guide. Not research theory — just the practical steps, week by week, for both instructors.
Use ← → to navigate • 10 slides
First, the big picture of how each class runs day-to-day:
Your web app's animations and exercises are visible to ALL your students equally. The only thing that differs between students is whether the "AI Hint" button appears when they're doing exercises on their own device.
This is the part that matters most. Let's walk through exactly what happens:
Students don't know about groups. Here's what it feels like from their perspective:
"I opened the app, did the BST exercise, and there was a hint button. I clicked it and it gave me a nudge about comparing to the root. Helpful!"
"I did the hash table exercise. No hint button this time. I worked through it myself and asked my neighbor when I got stuck."
Every student gets AI hints on exactly half the topics. No one is disadvantaged. The groups just get hints on different topics.
Add a group assignment feature: when a student first logs in, the app randomly assigns them to Group X or Y (stored in their profile). Build the per-topic hint toggle so the app checks: if (student.group matches this topic's hint-on group) → show hint button.
Give students the pretest.html web app during class. ~30 minutes. They pick their section, create their anonymous ID, take the quiz, fill out the survey. Data goes to Google Sheets automatically.
This is the easy part. You teach using your web app exactly as you already planned. Project the animations, walk through examples, have students do exercises on their devices. You don't need to do anything special — the app handles which students see hints automatically.
The app silently logs: hint requests, time on task, exercise completion, group assignment. You just teach. The data collects itself.
Repeat the concept test and survey. Conduct 8-12 interviews with selected students (mix of both groups).
Rolf's job is simple — he teaches his way and helps with shared assessments:
Rolf teaches his section however he usually does — slides, whiteboard, live coding, textbook. No web app, no AI, nothing changes about his teaching. That's the whole point of a control group.
Rolf gives the same pretest.html to his students during class. They select "Section B (Prof. Rolf's section)." Same quiz, same survey, same Google Sheet. ~30 min of class time.
Both sections take the same quizzes, assignments, and exams. Rolf helps design these (or reviews yours) to make sure they're fair for both teaching styles. This is the main collaboration point.
Same concept test again, plus the post-survey (which has slightly different wording for his section — asks about "lecture examples" instead of "web app").
The crossover RCT happens entirely within YOUR section. Rolf just needs to know: "give the same tests, use the same assessments." The randomization and hint toggling are invisible to him.
There are exactly 4 things you and Rolf need to coordinate on:
Design quizzes, assignments, and exams together. Both sections must take identical versions. This is the most important collaboration.
How: One of you drafts, the other reviews. Meet 2-3 times during the semester to align.
Both administer the same pretest.html in weeks 2-3 and the same post-test in weeks 14-15. Data goes to the same Google Sheet.
How: Share the URL. Rolf opens it in class, students do it on their phones.
Both sections should cover the same topics in roughly the same order. The comparison only works if students in both sections learn the same material.
How: Share syllabi at semester start. They don't need to be identical — just cover the same major topics.
Both instructors have edit access to the Google Sheet with pre/post test responses. Share assessment scores at semester end for analysis.
How: Google Sheet is shared. Assessment scores can be exported from your LMS.
Teaching style, lecture format, homework policies, grading curves, office hours — these can all be different. The study is designed to work despite these differences. The shared assessments are the equalizer.
| When | You Do | Rolf Does |
|---|---|---|
| Before Week 1 | Build group assignment + hint toggle in app. Decide which topics are AI-on for Group X vs Y. | Nothing |
| Week 1 | Share syllabi. Agree on shared assessment schedule. | Share syllabi. Review assessment plan. |
| Week 2-3 NOW | Give pre-test + pre-survey in class (pretest.html) | Give same pre-test + pre-survey in class |
| Weeks 4-13 | Teach with web app. App auto-toggles hints per student per topic. Log data. | Teach normally (slides, whiteboard, coding). No web app. |
| ~Week 6, 10 | Both give same shared quiz/assignment/exam. Compare scores later. | |
| Week 14-15 | Give post-test + post-survey (treatment version) | Give post-test + post-survey (control version) |
| Week 16 | Conduct 8-12 student interviews | Nothing |
| Summer | Analyze data together. Write paper together. Submit to SIGCSE 2027. | |
The technical implementation in your web app:
You configure the topic list and group rules once. After that, the app handles everything automatically. You teach the same way regardless — project the app, walk through examples, say "try the exercise." The app shows or hides the hint button per student behind the scenes.
Unlikely — students focus on their own work. But even if they notice, it doesn't invalidate the study. You can mention in the consent form that "app features may vary" as part of the research.
This is "contamination" and it slightly weakens the effect. But it works against finding a difference, so if you still find one, it's even more convincing. Log hint usage to check.
Yes. Every student gets hints on exactly half the topics. And at the end, you can unlock all hints for everyone. This is standard in crossover trials (medical studies do the same thing).
No. Rolf teaches his way. He just gives the same tests and surveys. The crossover RCT is entirely within your section — Rolf doesn't need to know the details.
Three powerful comparisons:
1. Your section vs. Rolf's (overall effect)
2. Group X vs. Y per topic (AI hint effect, causal)
3. Hint users vs. non-users (dose-response)
That's a valid finding! "AI hints did not significantly improve learning" is publishable and valuable. Education research needs null results too.