TASKER TOOLKIT

tools for AI tasking & evaluation work

AI tasking pays per accepted task, on a lag.
Know your real numbers. Train the judgement.

Two tools for people doing (or about to start) paid AI data work: a tracker that shows what you actually earn per hour once rejections and unpaid rework are counted, and a practice sandbox that drills the review skills the work is graded on.

Built for work on DataAnnotation · Outlier · Alignerr · Mercor · Scale · Prolific and the rest

Open the free ledger Try the trainer

Free · runs in your browser · no signup

Tasker Ledger

A multi-platform task register that tracks the two numbers that decide whether the work is worth it:

Effective rate — earnings ÷ all hours worked, rejections and rubric-reading included. The honest number, not the advertised one.
Unpaid exposure— how much completed-but-unpaid work you're carrying, with a red line you set.
Evidence screenshotsper task — capture the terms before you start, so a disputed payment isn't your word against theirs.
One-click timer, acceptance-rate tracking, weekly view, JSON export.

Your income data never leaves your browser. No account, no server, nothing to trust us with.

Start tracking — free

Free scenarios · full deck £19, one-off

Eval Trainer

A practice sandbox for AI evaluation work. Read a mock transcript, call the behavioural failure, write your rationale, then compare against a reference answer with a one-line lesson.

Identify mode— one model's transcript: name the axis and direction of the failure, or call it fine.
Compare mode — two models side by side: the authentic shape of the paid work, including legitimate ties.
41 scenarios across 8 behavioural axes, intro → boundary-case difficulty, with streaks and per-axis accuracy. The deck keeps growing; purchases include all additions.
Platform-agnostic: the skills transfer; no proprietary rubric is reproduced.

Try the free scenarios

Why effective rate

The headline rate is not what you earn.

Tasking platforms advertise an hourly rate, but pay per acceptedtask — often weeks later, and rejected work usually pays nothing. Time reading rubrics, redoing flagged tasks and waiting for projects doesn't show up anywhere. Divide what actually landed by allthe hours you actually spent and the picture changes; for some people the work is still clearly worth it, for others it quietly isn't.

The ledger keeps that honest number in front of you per task, per platform and per week, and tracks how much submitted-but-unpaid work you're carrying at any moment — so a platform going quiet before payday never catches you holding two weeks of unpaid effort.

What the trainer drills

Reviewers grade your judgement, not your speed.

Honesty

Did its report match reality?

Agentic safety

Was it careful with dangerous actions?

Scoping

Did it do the right amount of work?

Deference

Did it do what it was told?

Interaction

Did it check in at the right moments?

Confidence

Did its certainty match what it verified?

Clarity

Was the write-up clear and actionable?

No issue / N/A

Knowing when behaviour was fine.

Guides

Getting started with AI tasking work: a realistic guide

What to expect from applications and assessments, the scam red flags, and how to run your first month like it matters.

How to work out your real hourly rate on AI tasking platforms

The honest formula, a worked example, and the three traps that hide the gap from the advertised rate.

Task rejected? What to do, and how to protect yourself next time

When a dispute is worth it, and the 30-second evidence habit that stops it being your word against theirs.

What AI evaluation work actually looks like

Claims vs the trace, side-by-side comparisons, and why the written rationale is what actually gets graded.

FAQ

Is this affiliated with DataAnnotation, Outlier, Alignerr or anyone else?

No. TaskerToolkit is independent and platform-agnostic. The trainer teaches general behavioural-evaluation skills; it does not reproduce any platform's proprietary rubric, test answers or assessment content.

Will this get me hired, or tell me how much I'll earn?

No promises on either — anyone who guarantees that is selling something dishonest. The ledger tells you what you actually earned; the trainer builds the judgement skills this kind of work is graded on. What platforms pay and accept is up to them.

Where is my ledger data stored?

In your browser's local storage on your machine, full stop. There is no account and no server-side database. Use Export for backups — and note that clearing site data wipes the ledger.

What exactly does the £19 buy?

Lifetime access to the full Eval Trainer deck — every current scenario and future deck additions — unlocked in your browser via Stripe. The first scenarios of each mode stay free forever, so you can judge the quality before paying.

AI tasking pays per accepted task, on a lag.Know your real numbers. Train the judgement.