Teaching model. This is a faithful recreation of ELIZA (Weizenbaum, 1966), not a modern AI. It matches keywords and fills response templates — no understanding, no memory of context, no learning. The Rule Inspector shows exactly what it is doing.
ELIZA
How do you do. Please tell me your problem.
1 · Start here:
2 · See the mechanism:
3 · Try to break it:
Rule Inspector
Send a message to see how ELIZA processes it — step by step.
1 · Keywords found
2 · Rule selected
3 · Pattern matched
4 · Captured text
5 · Response template
Reflect
Human
Did any of ELIZA's responses feel meaningful to you? What made them feel that way — the words, or something you brought to them?
Machine
Look at the Rule Inspector. What information did ELIZA actually use to generate that response? What information did it ignore?
System
ELIZA's creator was disturbed that people found it empathetic. Where do you see similar keyword-response patterns in apps, customer service bots, or AI tools today?
Facilitator Note — Workshop Sequence
Opening move (2 min): Ask everyone to try Step 1 (Start here) without reading the Rule Inspector. Let the responses land. Ask: "Did any of those feel meaningful?" Hold that feeling — don't explain it yet.
Reveal move (5 min): Now point everyone to the Rule Inspector. Walk through one example together — "I feel really sad today." Show: keyword found → rule selected → pattern matched → template filled. The machine never read the word "sad" for meaning. It found a pattern and filled a slot.
Stress-test move (5 min): Step 2 and 3. "I feel like the concept of mother is overrated" — does the MOTHER rule
(weight 8) beat the FEEL rule? Try "The weather is nice today" — what happens when nothing matches? Try "What is the capital of France?"
— the WHAT rule deflects the question back. The rule fires regardless of whether a question makes sense to deflect.
Try the same feeling phrased two different ways — "I feel sad" vs. "I am feeling down." Do the same rules fire?
What happens if you put "mother" and "dream" in the same sentence? Which keyword wins, and why?
If family keywords feel sensitive for your group, use neutral collisions like "computer" + "dream" instead.
Type something ELIZA has no rule for. How many fallbacks does it cycle through before repeating?
Debrief questions:
At what point did ELIZA feel like it understood you? What made it feel that way?
Weizenbaum said he was disturbed that people found ELIZA empathetic. Why would that disturb a researcher?
A modern LLM also doesn't understand — it predicts tokens. But it trained on billions of examples instead of a few dozen rules. What changes about the failure modes? What stays the same?
If you replaced the Rule Inspector with a "Confidence: 94%" badge, would participants still notice it was pattern-matching? What does that suggest about LLM interfaces?
What would a machine need to actually understand you — not just respond plausibly?
Same input. Different mechanisms. Below, ten prompts are sent to both ELIZA and a large language model (responses pre-generated). ELIZA matches keywords. The LLM predicts the next most likely tokens based on patterns learned from billions of words. Neither one "understands" — but they fail in very different ways.
Discussion — Prediction Is Not Understanding
ELIZA fails visibly: give it an unusual sentence and it fires the wrong rule. LLMs fail more subtly: they produce fluent, confident text even when wrong. The rule inspector makes ELIZA's mechanism transparent. What would a "rule inspector" for an LLM look like — and why is it harder to build?