Getting started with AI tasking work: a realistic guide

Paid AI data work is real, flexible and sometimes decent money. It is also irregular, rejection-prone and surrounded by exaggerated claims in both directions. Here is what to actually expect, and how to start in a way you won't regret.

What the work is

Platforms like DataAnnotation, Outlier, Alignerr, Mercor and Scale pay contractors to produce and judge training data for AI models: writing and rating model answers, comparing two model attempts side by side, labelling data, checking factual accuracy, and evaluating how models behave on coding or reasoning tasks. Higher-paying projects skew toward specialist knowledge — coding, law, medicine, maths, languages — and toward evaluation skills: reading a transcript and judging what the model did against what it claimed.

What to realistically expect

An application, then an assessment. Most platforms screen with an unpaid (or low-paid) test of writing quality and judgement. Many applicants wait weeks or never hear back — apply to several platforms rather than waiting on one.
Irregular work. Projects appear and vanish. A great week proves nothing about next week, in either direction.
Rejections without much explanation. Part of the model. Track them; they are the difference between the advertised rate and what you actually earn.
Payment on a lag. Per accepted task, often weeks later. Never count promised money as earned.

Scam red flags

The legitimate version of this work never charges you. Walk away from anything that asks for: a joining or "training" fee, payment to unlock higher-paying tasks, your card details, or crypto-only payment. Be sceptical of anyone selling guaranteed acceptance or income figures — acceptance is the platform's call and income varies enormously. Apply only through a platform's own site, and check recent payment-proof discussion in its community before you commit serious hours.

Run your first month like it matters

Most people drift into this work, half-track it in their head, and quit (or overcommit) on vibes. The better way costs a few minutes a week:

Log every hour from day one — including assessments, rubric reading and rework. Your effective rate (earnings ÷ all hours) is the only number that tells you whether to continue.
Screenshot task terms before you start.Instructions change; your evidence shouldn't depend on memory.
Cap your unpaid exposure.Decide how much submitted-but-unpaid work you'll carry before pausing. A platform going quiet should cost you days, not weeks.
Review at week four.Per-platform effective rate vs your floor. Keep what clears it, drop what doesn't, and ignore anyone else's screenshots — your numbers are the ones that pay your bills.

If you want the better-paid evaluation work

The skill assessments actually test is judgement: reading a model's claims against the evidence in its transcript and writing a rationale that cites it. That improves with deliberate practice and feedback, not with reading about it — and practising before a paid assessment is much cheaper than learning on rejected tasks.

Track this automatically. The free Tasker Ledger keeps your effective rate, unpaid exposure and per-task evidence in one place — and your data never leaves your browser.

Open the free ledger