tools for AI tasking & evaluation work
Two tools for people doing (or about to start) paid AI data work: a tracker that shows what you actually earn per hour once rejections and unpaid rework are counted, and a practice sandbox that drills the review skills the work is graded on.
Built for work on DataAnnotation · Outlier · Alignerr · Mercor · Scale · Prolific and the rest
A multi-platform task register that tracks the two numbers that decide whether the work is worth it:
Your income data never leaves your browser. No account, no server, nothing to trust us with.
A practice sandbox for AI evaluation work. Read a mock transcript, call the behavioural failure, write your rationale, then compare against a reference answer with a one-line lesson.
Tasking platforms advertise an hourly rate, but pay per acceptedtask — often weeks later, and rejected work usually pays nothing. Time reading rubrics, redoing flagged tasks and waiting for projects doesn't show up anywhere. Divide what actually landed by allthe hours you actually spent and the picture changes; for some people the work is still clearly worth it, for others it quietly isn't.
The ledger keeps that honest number in front of you per task, per platform and per week, and tracks how much submitted-but-unpaid work you're carrying at any moment — so a platform going quiet before payday never catches you holding two weeks of unpaid effort.