How I Taught an AI Tutor to Handle Student Frustration

How Athena detects student frustration and turns it into a structured deal that keeps learning alive. The feature that required the most thought about how children actually learn.

—

How Athena detects student frustration and turns it into a deal that keeps learning alive

This is the third post in a series about building Athena. The first covered how it all fits together and the second explained the curriculum grounding system.

• • •

My child does not quit quietly.

When something clicks, the energy in the room changes. When it does not click, I get one-word answers, repeated sighs, and eventually a very deliberate attempt to change the subject. I have watched this pattern with every tutor, every homework session, every Khan Academy video that asked them to "just try it yourself first." The moment frustration tips into disengagement is fast and very specific. There is a particular terse "idk" that means we are done unless something changes.

When I started building Athena, this was the moment I most needed to solve. Not the chat interface, not the RAG pipeline, not the maths rendering. Those are all engineering problems with clear solutions. The frustration problem is a human problem dressed as an engineering problem, and it took me a while to realise that the solution was also human: negotiation.

This post is about how I taught an AI to bargain.

• • •

The Difficulty Engine: Two Layers

Before the negotiation can happen, Athena needs to know where a student is. Not just "which topic" but "what level of support do they need right now?" That state lives in a small but carefully designed difficulty engine.

I track difficulty on two layers simultaneously.

Internal level (1–5). An integer stored in a signed, tamper-proof cookie. It moves up and down based on performance. Five levels gives enough granularity to make fine adjustments without overcorrecting. It is invisible to the student.

Student-facing state. A three-position dial the student can see: Socratic (deep thinking, few hints), Ladder (step-by-step scaffolding, more support), and Emergency (maximum support, hint available). The dial is not cosmetic. Its position drives how Athena behaves in every response.

The mapping between them:

Levels 3–5 map to Socratic
Levels 1–2 map to Ladder
Emergency is not a dial position; it is a temporary mode that activates while a hint is in progress

The reason I did not just show students the number is that a ticking score is anxiety-inducing, especially for a twelve-year-old who already feels like they are failing. What they should experience is a qualitative shift in how Athena talks to them: more patient, more scaffolded, more broken-down. The number is for the engine. The dial is for the student.

The state machine lives in lib/negotiation/engine.ts and is written entirely as pure functions with no side effects. This makes it trivial to test and simple to reason about:

export const CONSECUTIVE_CORRECT_TO_LEVEL_UP = 3;
export const CONSECUTIVE_FAILURES_TO_LEVEL_DOWN = 2;

export interface DifficultyState {
  level: number;
  consecutiveCorrect: number;
  consecutiveFailures: number;
}

export function recordCorrectAnswer(
  map: SubjectDifficultyMap,
  subject: Subject
): { map: SubjectDifficultyMap; levelChanged: boolean } {
  const current = map[subject];
  const newCorrect = current.consecutiveCorrect + 1;

  if (newCorrect >= CONSECUTIVE_CORRECT_TO_LEVEL_UP) {
    const newLevel = clampLevel(current.level + 1);
    return {
      map: {
        ...map,
        [subject]: { level: newLevel, consecutiveCorrect: 0, consecutiveFailures: 0 },
      },
      levelChanged: newLevel !== current.level,
    };
  }

  return {
    map: {
      ...map,
      [subject]: { ...current, consecutiveCorrect: newCorrect, consecutiveFailures: 0 },
    },
    levelChanged: false,
  };
}

Three correct answers in a row: level up. Two consecutive failures: level down. The counters reset whenever either threshold is crossed, so the engine is always tracking recent momentum, not lifetime averages. A student who was struggling last session but arrives focused today will climb quickly.

Frustration bypasses this flow entirely and drops the level immediately:

export function recordFrustration(
  map: SubjectDifficultyMap,
  subject: Subject
): { map: SubjectDifficultyMap; levelChanged: boolean } {
  return {
    map: decreaseLevel(map, subject),
    levelChanged: map[subject].level > DIFFICULTY_MIN,
  };
}

This is intentional. Getting things wrong and being frustrated are different states that warrant different responses. Incorrect answers are part of learning. Frustration is a signal that the learning has stopped.

• • •

Teaching Gemini to Recognise Frustration

The obvious approach to frustration detection is a separate classifier: label a bunch of examples, fine-tune a model, call it as an extra inference step on each message. That would work. I chose not to do it.

The reason is that Gemini already has full context of the conversation. It knows this is the third consecutive wrong answer. It has seen the progression from full sentences to single words. A standalone classifier would need that same context passed to it, which means you have not actually saved an inference call; you have just added architectural complexity. For a single-user app running on a free tier, every unnecessary call matters.

Instead, I put the frustration detection in the system prompt as a structured block, assembled each request by buildFrustrationDetectionBlock():

function buildFrustrationDetectionBlock(): string {
  return `
FRUSTRATION DETECTION:
Monitor for these frustration signals in the student's messages:
1. Explicit statements: "I don't understand", "I can't do this", "I give up",
   "this is stupid", "this is too hard", "I hate this", "whatever"
2. Terse responses after multiple attempts: "no", "what", "idk", "idc", "ok",
   "sure", single-word answers when more was expected
3. Three or more repeated wrong answers on the same concept
4. Sudden topic changes or requests to stop
5. All-caps responses or excessive punctuation indicating frustration

When frustration is detected:
- IMMEDIATELY respond with empathy: "I can see this is tricky. That's completely
  normal - this is a challenging topic!"
- Try a COMPLETELY different approach - use a different analogy, break it into
  even smaller steps, or connect it to something they know
- Offer a hint trade: "Would you like to make a deal? I'll give you a hint,
  but in return you'll explain the concept back to me later."
- If frustration stems from zero knowledge of a foundational concept, switch to
  TEACHING MODE
- NEVER say "it's easy" or "it's simple" - validate that the topic is hard.`;
}

The model then reports what it detected via a structured evaluation block appended invisibly to every response, hidden inside an HTML comment:

<!--EVAL:{"studentAnswerCorrect":false,"frustrationDetected":true,"conceptMastered":false,...}-->

The server parses this with a regex, extracts the frustrationDetected boolean, and routes accordingly. The student never sees it; it is stripped before the response reaches the client. The prompt tells the model what signals to look for, what to do when it finds them, and (critically) what not to do. That last constraint matters. Without it, the default model behaviour when a student is struggling is to say some version of "don't worry, it's actually quite simple." That is the worst possible response to a frustrated twelve-year-old.

In practice, this approach is reliable because the frustration signals I care about are not subtle. "idk," "what," all-caps, three wrong answers in a row: these are not edge cases requiring careful probabilistic judgement. A well-instructed large language model reads them correctly every time.

• • •

The Hint Trade

When frustrationDetected returns true and the student does not already have outstanding verification debts, the server sets messageType to "hint_trade_offer". The client renders the Negotiation Toast.

The Negotiation Toast is a distinct UI moment that appears above the conversation thread. It does not feel like a chat message. It feels like an interruption, deliberately. The conversation pauses. There is a choice: accept the deal, or decline and keep trying independently.

The wording of the offer matters more than I initially expected. The toast does not say "Do you want a hint?" That framing positions the hint as a gift and the student as passive. What it says instead is closer to: "I will help you with this, but you will need to explain it back to me afterwards." The student is not receiving charity. They are entering a transaction. They get what they need (help getting unstuck), but they take on an obligation (proving they understood it). The asymmetry changes the psychology of the interaction.

When a student accepts, the next Gemini request is flagged with isHintTrade: true, which activates a different system prompt block:

if (isHintTrade) {
  hintTradeBlock = `
HINT TRADE MODE:
The student has accepted a hint trade. Provide a STRUCTURED hint that guides
them toward the answer without giving it directly.
- Give them a clear framework or partial solution they can build on.
- Remind them that they will need to explain this concept back later.
- End with a check-in question to see if the hint helped.`;
}

The hint is not the answer. This is the non-negotiable design constraint. Giving the answer to a frustrated student feels kind in the moment but is a complete abdication of the tutoring relationship. What Athena provides in hint trade mode is a scaffold: a framework, a worked analogy, a partial solution that closes some of the gap but leaves the student to close the rest themselves. The conversation can continue. The learning is still happening.

• • •

Paying the Debt

Accepting a hint creates a verification debt, tracked server-side in a queue. The student cannot ignore it and move on. Every new message is checked against the queue before normal processing:

if (!isHintAccept && !isHintDecline && !isFirstMessage) {
  const pendingVerification = await getNextPending(sessionId);
  if (pendingVerification) {
    return handlePendingVerificationReminder(
      pendingVerification, sessionId, subject, conversation.difficultyLevel
    );
  }
}

If there is a pending verification, the normal chat flow is short-circuited. Athena reminds the student of the deal they made. The verification itself takes one of two forms:

Logic Scramble: For procedural knowledge (the steps in a method, the order of a process), the student is shown the correct steps shuffled and must drag them back into the right sequence. Built with @dnd-kit, animated with framer-motion. Evaluation is deterministic on the server.

Reflection Lock: For conceptual knowledge, the student writes or dictates an explanation in their own words. A Mastery Ring (a circular progress indicator) fills as the word count increases, nudging them toward a substantive answer rather than a single dismissive sentence. Gemini evaluates the explanation against a set of key points, with specific feedback on what was covered and what was missed. The ring is a small but effective UX trick: it makes "not enough words" visible without ever saying "write more."

Three failed attempts trigger escalation: Athena provides a worked example of a similar but not identical problem, giving the student enough material to try once more. The suppression logic also prevents the hint trade from being offered if the student already has two or more outstanding verifications. There is a sensible limit to how much debt anyone should accumulate in one sitting.

• • •

The 24-Hour Decay Rule

One thing that surprised me in practice: difficulty achieved on a Friday afternoon does not necessarily carry over to Monday morning.

Memory degrades. Procedural knowledge especially. A student who was confidently at Level 4 on fractions before the weekend may genuinely benefit from a gentler re-entry at the start of the next session. Rather than making Athena discover this through a string of failed answers (which is discouraging and wastes session time), I built a decay rule: if more than 24 hours have passed since the last active session, the difficulty level drops by one on the next message.

export const DECAY_THRESHOLD_MS = 24 * 60 * 60 * 1000;

export function applyDecay(
  map: SubjectDifficultyMap,
  subject: Subject,
  lastActiveAt: Date,
  now: Date = new Date()
): SubjectDifficultyMap {
  const elapsed = now.getTime() - lastActiveAt.getTime();
  if (elapsed < DECAY_THRESHOLD_MS) return map;

  const current = map[subject];
  const newLevel = clampLevel(current.level - 1);
  if (newLevel === current.level) return map;

  return { ...map, [subject]: { ...current, level: newLevel } };
}

The decay is capped at one level per gap, regardless of how long the absence was. A student returning after two weeks drops one level, not the full five. The intent is a warmup, not a reset.

• • •

Band-Crossing: When the Dial Actually Moves

A level change within the same band, say from Level 3 to Level 4, does not change the student-facing dial. Athena's behaviour adjusts subtly, but the student does not see anything happen. That is correct. Small adjustments do not need ceremony.

But when the level crosses between Ladder and Socratic, something qualitatively different has happened. The student has moved from needing step-by-step support to independent deep thinking, or vice versa. That deserves acknowledgement.

export function detectBandCrossing(
  oldLevel: number,
  newLevel: number
): "up" | "down" | null {
  const oldState = mapToStudentState(oldLevel);
  const newState = mapToStudentState(newLevel);
  if (oldState === newState) return null;
  return newLevel > oldLevel ? "up" : "down";
}

When a crossing is detected, the server stores it as metadata. On the next request, buildDialChangeBlock injects context into the system prompt:

function buildDialChangeBlock(dialChangedFrom?: number, dialChangedTo?: number): string {
  if (dialChangedFrom === undefined || dialChangedTo === undefined) return "";

  const oldBand = dialChangedFrom >= 3 ? "socratic" : "ladder";
  const newBand = dialChangedTo >= 3 ? "socratic" : "ladder";

  if (oldBand === newBand) return "";

  if (dialChangedTo > dialChangedFrom) {
    return `
IMPORTANT CONTEXT FOR THIS TURN ONLY:
The student's difficulty level just increased from Level ${dialChangedFrom} to Level ${dialChangedTo}.
They've crossed from step-by-step mode to deep-thinking mode.
Naturally acknowledge their progress in your response (do NOT quote the level numbers).
Say something like "You are doing really well! Let me give you something a bit trickier."
but in your own words, woven naturally into your response. Only acknowledge once.`;
  }

  return `
IMPORTANT CONTEXT FOR THIS TURN ONLY:
The student's difficulty level just decreased from Level ${dialChangedFrom} to Level ${dialChangedTo}.
They've moved to needing more support.
Naturally acknowledge this in your response (do NOT quote the level numbers).
Say something like "No worries -- let us break this down a bit more carefully."
but in your own words, woven naturally into your response. Only acknowledge once.`;
}

The two constraints that matter most here: "do NOT quote the level numbers" and "only acknowledge once." Without the first, Athena announces the mechanics of the system to the student, which breaks immersion and feels clinical. Without the second, the acknowledgement can repeat in subsequent responses, which is awkward. A human tutor noticing a student is struggling would adjust naturally and move on. This block is trying to replicate that.

• • •

What It Actually Looks Like

Here is the scenario I most commonly see. My child is working on adding fractions. They have tried to add 1/4 and 1/3 by adding numerators and denominators separately, getting 2/7. Athena asks them to think about what it means for fractions to have different denominators. They try again, differently wrong. Athena tries a different angle. The response comes back: "idk." Then after a pause: "what."

frustrationDetected: true. The level drops. The Negotiation Toast appears.

They accept the deal. Athena explains, in structured terms, that you cannot add fractions with different denominators any more than you can add metres and miles without converting to the same unit first. The concept, not the calculation. It ends with: "Have a go now. What would you need to do to 1/4 and 1/3 before adding them?"

They type "make the bottom numbers the same." Athena asks them to try. They work out 3/12 and 4/12. They add them correctly.

Now the Reflection Lock appears. The Mastery Ring sits at zero. They type "you have to make the bottom the same first." Not enough; the ring stays low. They add: "because you can't add them when they're different, so you find a number that works for both and change the tops to match." The ring fills. Athena confirms they have covered the key points and clears the verification.

They did not just arrive at the answer. They understood why, and they said why, in their own words.

That is the whole point.

• • •

The frustration problem turned out to be solvable, and the solution was not technical in the way I expected. It was about designing an interaction with the right incentives: making help feel earned rather than dispensed, making the debt real enough to honour, and making the acknowledgement of struggle feel human rather than procedural. The engine handles the mechanics. The negotiation handles the moment.

• • •

What's Next

The next post in this series covers the verification system in more detail: the Logic Scramble and Reflection Lock components, how Gemini evaluates free-text explanations against key points, and the escalation paths when a student cannot clear their debt. It is the most interactive part of Athena and the part that most closely mirrors what a good human tutor does after giving help: checking that the help actually landed.

If this was useful, share it with a colleague. The more of us who understand what is possible, the better we will build it.

Filed under

#AI Tutoring #Athena #EdTech #AI in Education #Adaptive Learning #Frustration Detection #Build in Public

Alex Gray

Head of Sixth Form & BSME Network Lead for AI in Education. Alex explores how artificial intelligence is reshaping teaching, learning, and the future of work — with honesty, clarity, and a focus on what matters most for educators and students.

LinkedIn Instagram YouTube Email

Stay in the Loop

Get practical insights about AI in education, new articles, and training updates delivered to your inbox.

No spam. Unsubscribe anytime.

Work With Alex

Looking for hands-on support with AI integration, curriculum design, or teacher professional development? Alex works with schools and organisations worldwide to build practical, evidence-informed approaches to education technology.

Get in Touch View Courses →

The Difficulty Engine: Two Layers

Teaching Gemini to Recognise Frustration

The Hint Trade

Paying the Debt

The 24-Hour Decay Rule

Band-Crossing: When the Dial Actually Moves

What It Actually Looks Like

What's Next

Filed under

Alex Gray

Stay in the Loop

Work With Alex

Discussion

Keep reading

Why Smart Schools Are Ditching EdTech Subscriptions and Buying an API Key Instead

MIT RAISE: The Programme Teaching 25 Million Students That AI Is Something You Build, Not Just Something That Happens to You

Grounding AI Responses in Actual Curriculum

Never Miss an Insight