While the terms are often used interchangeably, we’ve learned that accuracy and quality are two different things.
Accuracy in data labeling measures how close the labeling is to ground truth, or how well the labeled features in the data are consistent with real-world conditions. This is true whether you’re building computer vision models (e.g., putting bounding boxes around objects on street scenes) or natural language processing (NLP) models (e.g., classifying text for social sentiment).
Quality in data labeling is about accuracy across the overall dataset. Does the work of all of your labelers look the same? Is labeling consistently accurate across your datasets? This is relevant whether you have 29, 89, or 999 data labelers working at the same time.
Low-quality data can actually backfire twice: first during model training and again when your model consumes the labeled data to inform future decisions. To create, validate, and maintain production for high-performing machine learning models, you have to train and validate them using trusted, reliable data.
4 Workforce Traits that Affect Quality in Data Labeling
In our decade of experience providing managed data labeling teams for startup to enterprise companies, we’ve learned four workforce traits affect data labeling quality for machine learning projects: knowledge and context, agility, relationship, and communication.
What affects data quality in labeling?
1. Knowledge and context
In data labeling, basic domain knowledge and contextual understanding is essential for your workforce to create high quality, structured datasets for machine learning. We’ve learned workers label data with far higher quality when they have context, or know about the setting or relevance of the data they are labeling. For example, people labeling your text data should understand when certain words may be used in multiple ways, depending on the meaning of the text. To tag the word “bass” accurately, they will need to know if the text relates to fish or music. They might need to understand how words may be substituted for others, such as “Kleenex” for “tissue.”
For highest quality data, labelers should know key details about the industry you serve and how their work relates to the problem you are solving. It’s even better when a member of your labeling team has domain knowledge, or a foundational understanding of the industry your data serves, so they can manage the team and train new members on rules related to context, what business or product does, and edge cases. For example, the vocabulary, format, and style of text related to healthcare can vary significantly from that for the legal industry.
Machine learning is an iterative process. Data labeling evolves as you test and validate your models and learn from their outcomes, so you’ll need to prepare new datasets and enrich existing datasets to improve your algorithm’s results.
Your data labeling team should have the flexibility to incorporate changes that adjust to your end users’ needs, changes in your product, or the addition of new products. A flexible data labeling team can react to changes in data volume, task complexity, and task duration. The more adaptive your labeling team is, the more machine learning projects you can work through.
As you develop algorithms and train your models, data labelers can provide valuable insights about data features - that is, the properties, characteristics, or classifications - that will be analyzed for patterns that help predict the target, or answer what you want your model to predict.
In machine learning, your workflow changes constantly. You need data labelers who can respond quickly and make changes in your workflow, based on what you’re learning in the model testing and validation phase.
To do that kind of agile work, you need flexibility in your process, people who care about your data and the success of your project, and a direct connection to a leader on your data labeling team so you can iterate data features, attributes, and workflow based on what you’re learning in the testing and validation phases of machine learning.
You’ll need direct communication with your labeling team. A closed feedback loop is an excellent way to establish reliable communication and collaboration between your project team and data labelers. Labelers should be able to share what they’re learning as they label the data, so you can use their insights to adjust your approach.
To learn more about quality and context, check out our Lessons Learned: 3 Essentials for Your NLP Data Workforce.
How is quality measured in data labeling?
There are four ways we measure data labeling quality from a workforce perspective:
- Gold standard - There’s a correct answer for the task. Measure quality based on correct and incorrect tasks.
- Sample review - Select a random sample of completed tasks. A more experienced worker, such as a team lead or project manager, reviews the sample for accuracy.
- Consensus - Assign several people do the same task, and the correct answer is the one that comes back from the majority of labelers.
- Intersection over union (IoU) - This is a consensus model often used in object detection within images. It combines people and automation to compare the bounding boxes of your hand-labeled, ground truth images with the predicted bounding boxes from your model.
You will want the freedom to choose from these quality assurance methods instead of being locked into a single model to measure quality. At CloudFactory, we use one or more of these methods on each project to measure the work quality of our own data labeling teams.
To learn more about measuring quality in data labeling, check out Scaling Quality Training Data: The Hidden Costs of the Crowd.
Critical Questions to Ask Your Data Labeling Service About Data Quality
- How will our team communicate with your data labeling team?
- Will we work with the same data labelers over time? If workers change, who trains new team members? Describe how you transfer context and domain expertise as team members transition on/off the data labeling team.
- Is your data labeling process flexible? How will you manage changes or iterations from our team that impact data features for labeling?
- What standard do you use to measure quality? How do you share quality metrics with our team? What happens when quality measures aren’t met?