Selected work

Labels that earned their place in production.

A few representative engagements across our domains. Names and figures are illustrative placeholders — swap in your real client stories, logos and numbers when you're ready.

Drone-based aircraft surface inspection
Computer vision · Aerospace

Drone-based aircraft surface inspection

An MRO provider needed pixel-accurate defect maps of fuselage panels captured by inspection drones — cracks, corrosion, dents and paint delamination classified by severity, at a scale no single engineer could sustain manually.

Challenge

Drone imagery of curved, reflective fuselage surfaces produces heavy glare and perspective distortion. Hairline cracks can be just 2–3 pixels wide, and corrosion stages overlap visually.

Approach

Annotators trained on aerospace maintenance manuals labelled defects with sub-pixel polygon precision, using multi-zoom workflows and cross-referencing severity charts provided by the client.

Outcome

A production-ready dataset the client used to train an automated pre-inspection model, reducing manual walkaround time significantly.

30k+
Images labelled
5
Damage classes
22
Sub-classes
Brain MRI segmentation for tumour detection
MedTech · Radiology

Brain MRI segmentation for tumour detection

A medical AI startup needed precise volumetric segmentation of brain structures and pathologies across thousands of MRI slices — to train a model that assists radiologists in early tumour detection.

Challenge

Brain anatomy varies significantly between patients. Tumour boundaries are often diffuse, and distinguishing oedema from healthy tissue requires genuine radiological understanding.

Approach

Annotators with medical imaging backgrounds segmented gliomas, ventricles and white matter regions using polygon tools in CVAT, following a clinical rubric validated by board-certified radiologists.

Outcome

A segmentation dataset that enabled the client to reach diagnostic-grade model performance on their internal validation set.

2k
Slices annotated
4
Anatomical structures
3
Tumour grades
Dental pathology detection on panoramic X-rays
MedTech · Dentistry

Dental pathology detection on panoramic X-rays

A dental-AI company needed annotated panoramic radiographs (OPGs) to train a model that flags caries, periapical lesions, impacted teeth and existing restorations — helping dentists catch findings they might miss during a busy day.

Challenge

Dental X-rays are noisy, overlapping structures make boundaries ambiguous, and pathology classification requires clinical dental knowledge — not just visual pattern recognition.

Approach

Annotators with dental backgrounds labelled each finding with polygons and severity attributes, following a taxonomy co-developed with the client's clinical advisory board.

Outcome

A validated dataset across 5 pathology classes that the client used to ship their first FDA-cleared screening feature.

5k
X-rays annotated
5
Pathology classes
19
Sub-classes
Cell event detection in fluorescence microscopy
BioTech · Cell biology

Cell event detection in fluorescence microscopy

A biotech research lab needed instance segmentation of individual cells plus classification of mitotic events, apoptosis, splits and mergers — across thousands of high-resolution fluorescence microscopy frames.

Challenge

Cells overlap, fluorescence intensity varies between frames, and rare events like splits or mergers are easy to miss. Annotators need to understand cell biology to distinguish artefacts from real events.

Approach

A team with biology backgrounds annotated cell boundaries and classified events frame-by-frame, using a multi-pass workflow: first boundaries, then event classification, then peer review.

Outcome

A dataset covering 80k+ cell instances with event labels that enabled the client to publish a benchmark-beating detection model.

45k+
Frames annotated
80k+
Cell instances
5
Event classes
Crop ripeness and weed detection from drone imagery
AgriTech · Precision farming

Crop ripeness and weed detection from drone imagery

An agritech startup needed labelled drone imagery to train a model that classifies fruit ripeness stages, detects weeds and identifies early signs of disease — enabling autonomous spraying and harvesting decisions.

Challenge

Outdoor lighting changes constantly, fruits at different ripeness stages look similar, and weed species vary by region. The dataset needed to cover multiple crop types across seasons.

Approach

Annotators labelled ripeness stages (unripe, turning, ripe, overripe), weed species and disease indicators with polygon annotations, covering tomatoes, peppers and leafy greens across 4 growing seasons.

Outcome

A multi-season dataset that the client used to reduce herbicide usage by 40% through targeted spraying.

23k
Images labelled
4
Crop types
12
Label classes
Building footprints from multi-sensor satellite tiles
Geospatial · Earth observation

Building footprints from multi-sensor satellite tiles

An earth-observation company needed building footprint polygons extracted from satellite imagery across multiple sensor sources and resolutions — covering urban, suburban and rural landscapes.

Challenge

Sensor differences (optical, SAR, multispectral) produce vastly different visual signatures. Shadows, cloud cover and varying GSD require annotators who understand remote sensing fundamentals.

Approach

Annotators trained on geospatial conventions digitised building footprints as precise polygons, handling multi-source imagery and applying land-cover classification in parallel.

Outcome

A production dataset powering the client's building-detection pipeline across 8 land-cover classes.

90k+
Images labelled
3
Sensor sources
8
Land-cover classes
Multi-hour conversational audio recording
Audio · Conversational AI

Multi-hour conversational audio recording

A voice-AI company needed thousands of hours of recorded conversations on specific topics — insurance claims, customer support, medical consultations — in multiple languages to train their speech recognition and NLU models.

Challenge

Conversations must sound natural, cover edge cases (interruptions, code-switching, background noise) and follow strict topic guidelines. Recruiting native speakers across languages and domains is complex.

Approach

Our team recruited and managed native speakers across German, English and Spanish who recorded scripted and semi-scripted conversations following detailed scenario guidelines, with QA on audio quality and topic adherence.

Outcome

Over 2,400 hours of domain-specific conversational audio delivered on schedule, enabling the client to launch their voice assistant in 3 new markets.

960
Hours recorded
3
Languages
8
Topic domains
Multilingual audio transcription and validation
Audio · Multilingual NLP

Multilingual audio transcription and validation

A global AI lab needed precise transcriptions of conversational audio in Japanese, Korean and Tagalog — including speaker diarisation, accent tagging and register classification — to improve their multilingual speech model.

Challenge

Each language has unique challenges: Japanese honorifics and Kansai dialect, Korean speech levels, Tagalog code-switching with English. Generic transcription services miss these nuances entirely.

Approach

Native-speaking transcribers annotated audio segments with timestamps, speaker labels, confidence scores, accent tags and register markers — following language-specific guidelines developed with the client's linguistics team.

Outcome

A transcription dataset across 3 languages that improved the client's word error rate by 18% on dialect-heavy test sets.

400
Hours transcribed
2
Languages
Expert evaluation & instruction data for a medical LLM
LLM & language · Medical AI

Expert evaluation & instruction data for a medical LLM

An AI lab needed high-trust human data for a clinical LLM answering questions across multiple medical specialties — instruction examples and graded responses that held up under expert scrutiny, not generic crowd labels.

Challenge

On high-stakes prompts, a plausible-but-wrong answer is worse than no answer. Grading it correctly takes someone who actually understands the subject — not a fast, generic pass.

Approach

Reviewers with backgrounds across medicine, biology, engineering, finance and software development wrote instruction data and graded responses against a clinical rubric — covering 10+ languages in-house, with adjudication on disagreements.

Outcome

Instruction and preference sets the lab could trust on sensitive prompts, with consistent reviewer agreement and a clear audit trail behind every label.

20k+
Responses graded
3
Expert domains
Jira ticket classification for engineering analytics
LLM & language · Software development

Jira ticket classification for engineering analytics

A developer-tools company needed thousands of real Jira tickets classified by type (bug, feature, task), difficulty level, required seniority and component — to train a model that auto-triages incoming tickets and predicts sprint capacity.

Challenge

Ticket quality varies wildly: some have detailed reproduction steps, others are one-liners. Classifying difficulty and seniority requires genuine software engineering experience, not just keyword matching.

Approach

Annotators with software development backgrounds classified each ticket across 4 dimensions, following a rubric calibrated on 500 pre-labelled examples. Edge cases were escalated to a senior reviewer.

Outcome

A classified dataset that enabled the client to auto-assign 70% of incoming tickets with 92% accuracy, saving engineering managers hours per week.

10k
Tickets classified
4
Dimensions
3
Seniority levels
Your story next

Bring us a dataset and a deadline.

Most engagements start with a pilot batch. Send a sample and your quality bar — we'll show you what production-ready looks like before you commit.

Start a pilot