Every score, explained.
Most AI scoring is a black box. Ours isn't. Scroll through the exact pipeline your session runs through — from voice answer to final score. No vibes, just math.
Your voice answer enters the engine.
The moment you stop speaking, audio + transcript are captured and queued for analysis.
“So when we were building the inventory sync, the main bug was that our cache wasn't invalidating when two warehouses pushed updates at the same time. We had to add a versioning layer…”
Transcript + voice signals split apart.
The audio gives us pace, pauses, filler words. The transcript gives us substance. Both flow forward in parallel.
Every turn gets a status tag.
An LLM tags how strong each turn was. The tag drives the math downstream.
Clear, specific, with example
On track, lacks specifics
Vague, abstract, no anchor
Stalled or admitted blank
14 signals get measured for every question.
Grouped into three buckets. Each signal is a small, transparent calculation — not a vibe.
- Relevance0OFF_TOPIC 10% / PARTIAL 60% / ANSWERED 100%
- Completeness0Anchors hit vs total anchors expected
- Specificity0Per-turn specificity score, averaged
- Depth0CONFIDENT/NEEDS_EVIDENCE/HAND_WAVY/DONT_KNOW
- Clarity0Penalizes vague & hand-wavy turns
- Conciseness0Bell curve · optimal 30–100 words/turn
- Coherence0Per-turn topic-drift penalty
- Confidence0AI confidence score per turn, averaged
- Hesitation0True if turn-1 was DONT_KNOW
- Follow-up responsiveness0Specificity gain turn 1 → final turn
- Recovery after hint0Status after RESCUE_HINT
- Self-correction0Specificity jump ≥ threshold
- Composure0Penalizes excess time, retries, hesitation
- Example usage0True if any turn = CONFIDENT
The 14 signals collapse into one question score.
A single number for that question — the building block of everything that follows.
Question scores combine into section scores.
Questions group by interview phase — Technical, Behavioral, Wrap-up. Each section gets its own score.
Mode-specific weights are applied.
A Job Interview weights technical depth differently than an HR round. The same section score lands at a different final number depending on what you're prepping for.
Three evidence gates run before the final score.
If there isn't enough signal to score you fairly, we cap the score and tell you why. No phantom 95% scores from 30-second sessions.
Not enough turns to score anyone fairly. We tell you to do a longer session.
Majority of the session had no useful content. A high score would be misleading.
Acing one area doesn't prove broad readiness. We weight it accordingly.
One score. With trend and difficulty context.
Everything above rolls into a single number — but the number alone isn't the story. Trend and difficulty tell you whether you're actually improving.
Try a session.
See your number — and the math behind it.
Every report you get points back to the signals above. No surprises.