2026-05-14

How Rekalo Uses AI Feedback Without Inflating Your Band Score

AI grading is only useful if it is accurate. Here is how Rekalo's rubric enforcement prevents LLM score inflation and keeps your band scores honest.

The Problem With AI Grading

AI language models are, by default, generous. Ask a large language model to grade an essay and it will often find something encouraging to say about every paragraph. This is useful for motivation but dangerous for band scoring. An inflated band score tells you that you are ready for an exam when you are not.

This is not a hypothetical concern. In practice, without constraints, models tend to concentrate scores in the upper-middle of whatever scale they are given. They are trained on human feedback that rewards helpfulness, and "you got a Band 7" feels more helpful than "you got a Band 5."

Fluent addresses this through rubric enforcement — a hard gate that sits between the model's initial output and the score you see.

How the Rubric Enforcement Mechanism Works

IELTS Writing and Speaking are graded against published band descriptors. For Writing, the four dimensions are Task Achievement, Coherence and Cohesion, Lexical Resource, and Grammatical Range and Accuracy. For Speaking, the dimensions are Fluency and Coherence, Lexical Resource, Grammatical Range and Accuracy, and Pronunciation.

Each dimension has specific criteria at each band level. A Band 7 Task Achievement response, for example, addresses all parts of the task, presents a clear position throughout, and supports main ideas with extended detail. A Band 5 response addresses the task only partially and may present limited ideas without supporting detail.

Fluent's grading process works in two stages:

Initial grading. The AI model scores the response against the rubric for each dimension and produces a band value per dimension.
Rubric validation. A deterministic validator checks each emitted band value against the expected schema and known band range. If the model produces a value outside the valid range, or in a format that cannot be reliably parsed as a band score, the grade is rejected and the model is asked to try again with the validation error as context.

This loop runs a bounded number of times. If valid scores are not produced within the retry limit, the grading attempt raises an explicit error rather than passing through a malformed score. Nothing reaches your results page without passing the validation gate.

What This Means for Your Scores

The rubric enforcement mechanism has two practical effects:

Scores reflect the rubric, not the model's optimism. The model cannot award a Band 8 just because the response is well-structured if the vocabulary range does not support that level by the descriptor criteria. The validator catches the mismatch.

Scores are consistent across attempts. Because grading is gated by the same rubric schema on every run, the same response will receive comparable scores across grading sessions. There is still inherent variability in AI judgment, but the enforcement layer reduces the range of that variability.

What AI Grading Does Not Replace

Rubric enforcement does not make AI grading equivalent to an official IELTS examiner. Official IELTS examiners are trained and standardised through a separate process. Fluent's AI grading is a practice tool — it gives you consistent feedback against the published criteria to guide your preparation. Your score on the real exam depends on official exam conditions and the specific materials on test day.

The goal is an honest signal for your practice loop: a score that tells you what to work on, not one that tells you what you want to hear.

AI Feedback Beyond Band Scores

Alongside band scores, the grading process produces written feedback for Writing responses. This feedback references the rubric dimension that needs the most development and offers a concrete example of what improvement at the next band level would look like.

The feedback is generated under the same constraints as the band scores: it is based on the rubric, not on generic encouragement. If your Coherence and Cohesion score is lower than your Grammatical Range score, the feedback reflects that specific imbalance.

Speaking feedback is currently more limited — the voice transcription and scoring process is newer, and the feedback surface is being developed as pronunciation analysis data matures.