Medical AI Breakthroughs: How High-Quality Data is Transforming Healthcare and Medicine

Within the burgeoning fields of AI and ML, computer vision (CV) is emerging as the right medicine for a multitude of challenges in the healthcare field. CV technologies are being applied to clinical diagnostics, robot-assisted surgery, hospital spatial intelligence, patient monitoring, and healthcare research and development activities, such as biopharmaceutical discovery. Medicine is becoming more personalized and more precise, fostering biotechnological breakthroughs, improving clinical care, helping medical professionals make more accurate decisions, and lowering costs.

CV applications are powered by the availability of massive amounts of image and video data, and sophisticated capabilities are required to design, build, deploy, operationalize, refine, and optimize a ML model. Data drives every aspect of medical AI model development, and every second, radiologists and other medtech professionals are churning out exabytes of medical images via X-rays, EEGs, MRIs, CT Scans, microscopy images, and other image-capturing technologies.

The key to mastering the AI development lifecycle and bringing AI products to life is feeding your ML models with high-quality training data. Doing so requires the use of the best annotation tools and a properly trained human-in-the-loop team that has proven medical AI labeling experience. This ensures your data pipeline scales while maintaining the quality and security of your data.

This guide contains proven and comprehensive approaches to data annotation using human-in-the-loop (HITL) for creating the next medical AI breakthroughs. In it, we will:

Summarize data labeling for medical AI, covering tools, platforms, and approaches.
Dive into emerging computer vision use cases across the healthcare industry.
Mention innovative organizations using medical AI to deliver breakthroughs.
Answer key questions and show how the right human-in-the-loop managed workforce can help you scale medical image data annotation projects while maintaining quality and keeping your data secure.

This guide will be helpful to you when:

Discovering how better training data with high-quality annotations helps medical AI companies deliver product innovations and enhance customer satisfaction.
Planning to use computer vision training data, focusing on quality and scalability.
Learning how medical AI data labeling workforces use a wide range of annotation tools.
Selecting and getting started with an expertly trained human-in-the-loop (HITL) workforce that can accelerate AI initiatives at every stage of the AI lifecycle—from design to optimization.

The use of medical images to improve medical diagnosis at the point of care is exploding. Every year in the U.S., it is estimated that nearly 40 million MRI scans, over 80 million CT scans, and 152.8 million X-rays are performed on patients. Additionally, millions of medical images are generated every year through other means, from robotic devices used in operating rooms to medical research and development.

And, just as the use of medical images for medical diagnosis, robotics, and research is exploding, so is the global market for computer vision in healthcare, which is expected to exceed more than $343.3 billion by 2024 and $416 billion by 2025.

Diving deeper into the computer vision market for the healthcare industry, there are three areas of medicine that stand to benefit the most from developments in AI, machine learning, and the training data that feeds these advanced algorithms:

Processing and reading images in real-time to improve diagnoses, treatments, and prediction of diseases.
Advancing the use of robotics and other medical devices.
Improving the process of medical research and development.

While computer vision algorithms have made tremendous advances over the past few years, they are not perfect. In order to produce the best algorithms, medical data labeling and annotation must be high-quality, accurate, and cost-effective. This is especially important in high-stakes industries like healthcare where people’s lives and long-term health are at stake.

In machine learning, data labeling or data annotation is the process of structuring data to show the outcome you want your machine learning model to predict. You are marking - labeling, tagging, transcribing, or processing - a dataset with the features you want your machine learning system to learn to recognize in order to improve patient care, accelerate diagnosis and triage, or confidently introduce medical AI solutions to the market.

Labeled data is annotated to show the target or outcome you want your machine learning model to predict. Data labeling is an umbrella term that can entail any number of different types of tasks depending on the type of medical data and specific healthcare use case. Examples include data tagging, annotation, moderation, transcription, and processing. The process of data labeling involves marking a dataset with key features that will help train your algorithm. Labeled data explicitly calls out features that you have selected to identify in the data, and that pattern trains the algorithm to discern the same pattern in unlabeled data.

Data labeling supports a wide variety of use cases for healthcare such as identifying cancerous cells in chest X-rays to improve early diagnosis, transcribing doctors’ notes to streamline patient communications, or annotating surgery footage frame-by-frame to help train surgical robots.

Why is Training Data Key to Machine Learning?

No element is more essential in machine learning than large quantities of quality training data. Training data refers to the initial data used to develop a machine learning model, from which the model creates and refines its rules. The quality of this data has profound implications for the model’s subsequent development, setting a powerful precedent for all future applications that use the same training data.

This training data is used to train medical AI models for a variety of real-world applications: from making medical diagnoses and enhancing surgical robotics to accelerating complex research and development. Similar to the way an accountant has used the calculator to augment laborious and cumbersome calculations, medical AI models augment the work of medical professionals.

When trained properly, computer vision models often reveal insights that medical professionals, researchers, and statisticians may not readily spot—and often far more rapidly than a trained expert, such as a radiologist.

The data labeling tools and the teams using them for training and deploying machine learning models in healthcare can determine the success or failure of your AI project. Optimizing data labeling for medical AI requires the right technologies, tools, and teams to augment and support your existing workforce.

Choosing the right tool may not be a fast or easy decision. The data annotation tool ecosystem changes quickly in healthcare as more providers offer options for an increasingly diverse array of use cases. Tooling choices range from instant out-of-the-box functionality to completely customized. It is important to understand how teams can most effectively use the right tool to deliver increased productivity and higher quality.

In healthcare, where precision is required, every data annotation tool is meant to be used by a human workforce — even those tools that may lead with an AI-based automation feature. You still need humans to handle exceptions and quality assurance. Nearly every AI model requires refinement and optimization over time to ensure quality is well kept.

Selecting the right workforce to implement data annotation tools is as important as the tool itself.

Segmentation

This method can be used in many ways to analyze the visual content in images — from CT scans and microscopy images to MRIs, X-rays, and more — to determine how objects within an image are the same or different. It also can be used to identify differences over time. One example is segmenting surgical images at the pixel level, including tissue, instruments, needle, thread, etc.

Bounding Boxes

These are used to draw a box around the target object, especially when objects are relatively symmetrical, such as lungs, kidneys, and other organs and anatomical features. Bounding boxes are also used when the shape of the object is of less interest or when occlusion is less of an issue. They can be two-dimensional (2-D) or three-dimensional (3-D). A 3-D bounding box is also called a cuboid. In medical AI, we often see bounding box and polygon techniques used in conjunction with “masking” - a pixel-level annotation that is used to hide areas in an image and to reveal other areas of interest, making it easier to hone in on certain areas of the image.

image-annotation-in-medical-ai-featured-medium-thumbnail

Landmarking

This is used to plot characteristics in the data, such as with facial recognition, to detect facial features, expressions, and emotions. It is also used to annotate body position and alignment using pose-point annotations. In annotating images for surgical robotics, for example, you can determine where a surgeon’s hand, wrist, and surgical devices are in relation to one another while operating.

The potential uses of computer vision in healthcare are growing. However, today the applications mostly fall into one of three major categories:

1) Diagnostic and Clinical Imaging

Significant progress in AI and computer vision in the medical industry results in faster and more accurate diagnoses. An expanding choice of powerful tools supported by machine learning can automate the segmentation, tagging, and classification of millions of images from X-rays, CT scans, and MRIs generated every year. In addition, these systems can identify trends across multiple images or groups of images that are not always easily identifiable by medical professionals.

Annotating data from X-rays, ultrasounds, CT scans, and MRIs helps medical professionals gain insight into patients’ physical condition, predict when a disease will develop and when appropriate treatment will be needed.

Data labeling and annotation of medical images are helping reduce time spent on unnecessary diagnostic procedures and providing healthcare professionals with the means to make more accurate diagnoses and administer more effective treatments.

Over the next decade, machine learning will transform how medical images are used by reducing errors, improving outcomes, and lowering costs.

When done in the aggregate, using well-structured CV training data can help medical professionals predict the emergence of early diseases and develop treatment plans that help avoid serious medical conditions developing in specific patients.

One specific challenge to quality data annotation or labeling is the nuance of properly annotating 2-D and 3-D medical images that contain multiple layers and transparencies. For example, a chest X-ray or MRI requires viewing and analyzing multiple layers of internal organs. Done properly, data annotation combined with computer vision and machine learning can visually analyze interactive 3-D models to make more accurate medical diagnoses and provide earlier detection of specific diseases.

Companies to Watch in Diagnostic and Clinical Imaging

Intelerad Medical Systems - A scalable medical imaging platform connects clinicians to a powerful imaging ecosystem.
Tempus - Data-driven precision medicine with the practical application of AI in healthcare.
Sema4 - A patient-centered health intelligence company dedicated to advancing healthcare through data-driven insights.
LungLife AI - Transforming lung cancer diagnosis and management through AI-enabled analysis of biomarkers in blood.

2) Healthcare Robotics and Devices

Robotics is rapidly transforming healthcare. In the operating room, robots support minimally invasive procedures. In clinical settings, fueled by the onset of COVID-19, the use of robotics gained wider traction by helping keep patients and medical professionals safer by performing tasks to help reduce exposure. For example, Intuitive Surgery’s Da Vinci robotic system is capable of performing a multitude of challenging surgical procedures remotely.

CV-enabled robots capture thousands of images in the operating room or medical offices. By annotating high-definition 3-D images and videos, AI-enabled robots can help surgeons improve techniques by guiding them to the specific area of the body that needs attention while avoiding damaging other nerves, muscles, and organs.

Surgical robots require the highest quality computer vision technology. It is essential that the annotations used to train models are spot on to ensure patient safety and positive post-op health outcomes. In some cases, with the right model and training data feeding it, it is even possible to develop robotic surgical techniques that assist in the decision-making processes.

Companies to Watch in Healthcare Robotics and Devices

Arterys - Extracts actionable insights from medical images to add clinical value, improve diagnostic decision making, efficiency, and productivity.
Caresyntax - Technology to make surgery smarter and safer, enabled by IoT, data analytics, and AI.
Fortive - A provider of essential technologies for connected workflow solutions across a range of attractive end-markets.
Vicarious Surgical - Enhancing the ability of surgeons and expanding worldwide access to high-quality care through the use of surgical robotics.

3) Medical AI Research and Development

Developing a new procedure or delivering a new device or pharmaceutical product to market requires massive amounts of data to receive approval from regulatory bodies. Since over 90% of all medical data generated today are images, the right data labeling solution can get these products and treatments to market faster. In addition, companies researching biotechnology for drug discovery, understanding the impact of lifestyle choices and mental health on chronic conditions, and developing new mRNA vaccines as quickly as was done for COVID-19 benefit from annotating millions of computer-generated images.

Training data analysts to perform quality work is vital as the medical industry is not immune to the current global skills shortage. In fact, the U.S. faces a shortage of between 46,900 and 121,900 physicians by 2032, according to the Association of American Medical Colleges. The projected shortfalls range between 21,100 and 55,200 for primary care and between 24,800 and 65,800 for non-primary care specialties.

Additionally, radiologists are among the most in-demand specialists in the U.S. and receive the highest starting salaries. While radiologists and other medical professionals, including diagnostic medical radiologists and radiologist assistants, play a crucial role in analyzing medical images at the point of care, it is impractical to scale data labeling projects solely relying on medical professionals. They simply don’t have the bandwidth.

While medical professionals play an essential role, trained data annotators and medical professionals can work together to scale the process affordably and efficiently.

Medical professionals can help design for scale using proven workforce management processes that grow alongside your organization. The process often starts by incubating and iterating a subset of the work to develop methods to support high accuracy, quality, and speed.

It is critical that a trained data annotation workforce become specialists in the nuances of medical image annotation and learn the particular task requirements for each use case that requires annotation.

Quality

The right medical data labeling workforce should have previous experience. They should be able to provide references of previous work that demonstrates high customer satisfaction and the ability to meet or exceed pixel-level accuracy goals. Assure they have the right QA methods that match each project’s acceptable margin of error rates. Quality comes first. Some certifications can help assure high levels of quality data annotation.

Workforce Scalability

Assure your managed workforce provider has dedicated teams with the expertise to handle the most extensive healthcare datasets in the world and the ability to complete projects with millions of annotations. These projects would have taken years with in-house resources, but a managed workforce provider can ramp up in as little as two weeks. Responsiveness is table stakes, so ensure that they respond quickly (and thoroughly) to your outreach and questions because this sets the tone for your relationship and informs what a partnership might look like.

Balancing Price with Performance

Doctors should not annotate your data, and neither should other medical professionals and staff who need to focus on their primary job roles, such as providing patient care. You should entrust the annotation work to the annotation experts. The right managed workforce for AI in healthcare can scale to millions of annotations in months and have the ability to respond to changes in market conditions, product development, and business requirements.

They can support data work across the entire AI lifecycle and flexibly add to scope as data needs change. The right approach to building machine learning models for the medical field must be flexible — transparent and straightforward pricing models will ensure that you control costs as projects scale. The cost of scaling data labeling projects with in-house teams or a crowdsource model can quickly become prohibitively expensive.

Security & Privacy

Robust data security and business continuity policies can ensure a managed workforce can safely and consistently serve our medical AI clients. A set of core security offerings that cover key aspects of people, process, and technology — from GDPR, Soc2, ISO-9001, and HIPAA/PHI must be in place, as well as the ability to uncover and accommodate unique healthcare data.

Trust & Communication

The right managed workforce vendors will use a workforce management platform, bolstered by dedicated messaging channels and collaboration tools. This ensures that their managed team stays focused on their feedback and needs, and it fosters long-term continuity and stability.

Accelerated Medical Computer Vision with Turnkey Data Annotation

Whether you choose our CloudFactory Data Annotation Solution or our Computer Vision Managed Workforce option, we reduce complexity and give medical organizations a faster path to high-quality data labeling.

We bundle a professionally managed workforce with the option to use a best-in-class annotation platform for one inclusive price so you can:

Scale fast with our trained, experienced, and managed teams of data analysts that can be ready to start your project in as little as one week.
Remove the expense and hassle of directly managing an annotation workforce and a complicated tech stack.
Streamline purchasing by integrating a data annotation platform and workforce into a single, monthly subscription service that quickly scales with your needs.
Improve labeling quality by using experts trained on annotating pixel-level segmentation of multiple images using a tooling agnostic approach.
Respond to changes in market conditions, product development, and business requirements.
Have robust data security and business continuity policies in place while also addressing healthcare-specific security and data privacy concerns.
Improve communications and build relationships with dedicated teams, messaging channels, and collaboration tools.

Which Service is Right for You?

	Data Annotation Solution	Managed Workforce
Standard features included with services	Fully managed workforce, vetted and trained for your use case Dedicated project managers and Client Success support Capacity control management (hourly, daily, or weekly based on project) SOC 2 certified Workforce security: screening, NDA, activity monitoring Monthly subscription pricing
Available upgrades for all services	Advanced network security AES256 encryption Premium workstations
Annotation tool	Yes	No
Bring-your-own-tool (BYOT) option Commercial, open source, or proprietary	No	Yes

Why Choose CloudFactory?

Extend your Team

Our vetted and managed teams have served medical AI clients in use cases that range from simple to complex.

Quality at Scale

Our proven processes securely deliver accurate medical data and we are designed to scale and change with your needs.

Subscription-Based Terms

Flexible contract terms and pricing.

Don’t just take our word for it. Here is what some of our medical AI clients say about CloudFactory:

Our images are complex and difficult to annotate. The frequent feedback conversations and short iteration cycles are useful in getting the annotation we want.

Rickard Sjögren

Senior Scientist, Sartorius

The space that I’m in is going through an evolution. If you don’t have AI capabilities, you’ll be left behind. But I can’t focus on the product if I’m swamped doing people management for image annotation. CloudFactory takes that burden off of us.

Founder

Medical AI Company

CloudFactory is interested in maintaining a relationship. You are interested in more than just taking the job and doing it. There is a larger interest in the whole project and it creates this feeling that you are an extended part of our team. That’s more attractive than an outsourcing company that is more transactional.

Alberto Rizzoli

CEO and co-founder, V7 Labs

Who Does CloudFactory Work with?

CloudFactory has worked with several leading organizations to advance their use of medical AI through computer vision.

CUSTOMER SUCCESS SPOTLIGHT:
CloudFactory and V7 Aiding Covid Research

V7 Labs wanted to apply their machine learning expertise to lung images to help researchers study COVID-19 and, eventually, help clinicians spot and triage more serious cases earlier. They partnered with CloudFactory to create an open-source dataset that could help researchers understand COVID-19 and other lung conditions going forward.

CloudFactory used a combination of V7 Darwin’s auto-labeling tool and human-refined segmentation annotations to create a lung dataset that helps eliminate age bias in pulmonary research and ML training. Over 6,000 chest X-rays were annotated to create a dataset that’s now freely available on GitHub and V7 Labs’ websites. The models have successfully identified COVID-19 and other lung ailments during preliminary tests, but their efficacy must be confirmed through official clinical trials.

CUSTOMER SUCCESS SPOTLIGHT:
CloudFactory and Sartorius Annotate 1.6 Million Cell Images

CloudFactory worked with Sartorius to segment and annotate complex cell imagery for AI cell identification to help spur medication development. Their images are complex and difficult to label. CloudFactory performed instance segmentation on high-quality, high-resolution microscopy images and videos at the single cellular level. In a two-year engagement with Sartorius, CloudFactory annotated 1.6 million cells, exceeded quality goals, and saved years in developing the largest dataset in cell imaging.

CUSTOMER SUCCESS SPOTLIGHT:
CloudFactory Building a Medical Image Database

CloudFactory established a two-year relationship with one medical AI company to use medical tagging to create an image database for research and evidence-based care. The CloudFactory workforce labeled thousands of images to expand their product offering quickly and accurately. The team annotated radiographs in the client tool to identify joint issues and provide predictive treatment advice. An imagery database helps enhance the ability to assess various disorders. More than 24,000 images were tagged in six months, and it took the team less than two weeks to match the quality of internal teams.

CUSTOMER SUCCESS SPOTLIGHT:
CloudFactory Labeling Images from Surgical Robots

One medical robotics company established a two-year relationship with CloudFactory to improve patient outcomes and reduce costs to healthcare systems by integrating computer vision and robotics designed to provide advanced imaging and diagnostics during surgery. The team conducted segmentation of surgical images gathered by surgical robots and tagged tissue, instruments, needles, and thread. As a result, it exceeded quality targets with more than 97% accuracy.

CUSTOMER SUCCESS SPOTLIGHT:
Data Labeling for Cancer Care

CloudFactory helped one medical research institute revolutionize cancer research by developing personalized treatments. The CloudFactory Computer Vision Managed Workforce annotated image segments using MATLAB, defining the shape of cells in tissue biopsies. Segments were compiled to render 3-D images, which were later contributed to an open-source project seeking to accelerate biological understanding of cancer and clinical decisions for cancer treatment. The team quickly scaled the workforce from zero biological knowledge to expert cell segmentation. Training now requires no client time investment and has replaced the previous ineffective strategy.

Medical AI - Frequently Asked Questions

Will AI replace medical professionals?

No, medical AI is not designed to replace medical professionals. We will always need doctors to deliver care and innovate. AI helps medical professionals better understand potential trends from a large set of data that may not always be apparent to the human eye.

What role do medical professionals play in developing high-quality training data for CV models?

Medical professionals play a key role in helping to formulate and understand medical images in the initial data set. Additionally, medical professionals should play a role in training managed workforce teams to properly identify and label medical images.

Should I use medical expert annotators, a managed team approach like CloudFactory, a crowdsource vendor, or hire in-house to train, validate, and optimize my medical AI models?

Typically, in-house employees can manage your data needs with reasonably good quality until it’s time to scale your model. Contractors and freelancers are other options, but they need to be trained and managed; quality is hard to control, and costs can balloon. Crowdsourcing leverages the cloud to send data tasks to a large number of people at once; however, compensation is usually based on speed, so quality tends to suffer, and attrition is high, so you have to keep retraining and rebuilding your “workforce.” Finally, considering that it takes nearly 800 hours to properly annotate one hour of video, the cost of using medical expert annotators can quickly get out of hand.

CloudFactory’s model works best for medical AI use cases. Our managed workforce combines the quality of a trained, in-house team with the scalability of the crowd. It’s ideal for data work because dedicated teams are steeped in your business rules and they stick with projects long-term, enabling them to increase their throughput and accuracy while providing consistent labeling quality.

Can labelers or annotators do this work well if they are not medical professionals or have extensive medical training?

Yes, they do not need to be medical professionals or have extensive medical training. If they have been properly trained on the right data annotation techniques and business requirements for your project, they will label your data effectively. CloudFactory’s managed teams are accustomed to repeating the same tasks, so they quickly increase the quality of annotations and the ability to scale, often outperforming medical professionals on labeling tasks. Here are 5 key questions to ask before outsourcing healthcare data labeling.

How can we speed the development of data annotation projects without sacrificing quality?

At CloudFactory, we deliver the work of an in-house team without the burdens of managing one. Our workforce delivers high-accuracy data annotations, can iterate processes in real-time, and can establish the domain knowledge required to resolve even the toughest medical image edge cases. We can do this because we deliver tailored strategies that combine people, process, and technology, and we have deep knowledge of medical AI use cases, computer vision AI development, and machine learning.

Can I replace in-house teams with external annotation teams and still maintain quality?

CloudFactory teams often exceed the quality of in-house teams while scaling projects faster. Our medical AI clients consistently rate our teams 5/5 on quality and often find that we overachieve on quality-related KPIs like annotation accuracy while maintaining high throughput at scale. There are also certifications that can help assure high levels of quality data annotation.

Does CloudFactory have the medical AI and computer vision experience they claim?

Demonstrating relevant experience is critical. CloudFactory has a growing list of medical AI past performance from over 10 years of annotation experience, and we will provide additional references as needed.

Should we use automated (or AI-assisted) labeling for computer vision projects in healthcare?

We typically offer our clients a combination of both. The tools that we use (whether yours, ours, one in our partner network, or other tools) often allow for both automated, or AI-assisted, and “manual” data annotation (i.e., humans-in-the-loop), and they can be quite complementary when leveraged by an experienced medical AI team like ours. Humans-in-the-loop teams are critical when dealing with edge cases or exceptions that ML models are not yet trained to identify or do not do so at an acceptable level of accuracy.

How should medical professionals and experts be incorporated into the computer vision annotation team?

CloudFactory teams are data annotation experts. At the start of each engagement, we work with clients to understand business requirements to assure our teams are properly trained on specific use cases and tasks. This often involves training initiated by medical experts, but with our expertise and years of experience in this field, we often see familiarities with work we have already done, and we are therefore able to plug in rapidly to meet business requirements and begin scaling the annotation workload. We find that the upfront communication about business requirements, and as needed training by medical professionals on complex tasks, is key to a successful CV annotation team’s success.

How do I prepare to work with an external team?

For medical image annotation, the first step is developing a detailed scope of the project and determining KPIs for quality, scalability, and project duration. At CloudFactory, a critical component in working with clients is having a dedicated project manager, who leads your analyst team, serves as your day-to-day contact, and optimizes your team’s outputs. We also provide a client success manager, who advocates for you, tracks outcomes to ensure target metrics are achieved and can troubleshoot as needed. These investments result in worker accountability, engagement, and a bias toward action — all of which contribute to better outcomes for your organization.

How do I improve my medical AI data quality and model performance?

At the start of every project, quality targets should be established. Projects that fall below quality targets are generally related to a poorly trained algorithm and inaccurate data labeling. The key to improving quality at scale is using teams of trained annotators who have experience with medical images.

Learn how CloudFactory can help securely scale your medical image and video data labeling initiatives here.

Medical AI Breakthroughs:

Read the full guide below, or download a PDF version of the guide you can reference later.

Table of Contents

Introduction:Will this guide be helpful to me?

Market Overview:Computer Vision & Medical Images

What is Data Labeling for Medical AI?