Tips for Gathering & Building Data Sets

Gathering an initial data set for your machine learning project is the first hurdle on the path to a successful machine learning algorithm. How do you get your hands on the perfect data set? We joined our partners at Keymakr to discuss the attributes of an ideal data set, the pros and cons of using a pre-created data set, and some best practices for building your own.

In this webinar, you’ll learn:

  • What makes an ideal data set
  • Best practices for building your own, custom data set
  • Approaches to dealing with sensitive data
  • What to do if the data you need doesn’t exist

Tell us about yourself

Anthony Scalabrino

Sales Engineer, CloudFactory


Anthony Scalabrino is a Sales Engineer at CloudFactory where he uses his experience with AI, ML, and DL to provide end-to-end technical and non-technical solutions for CloudFactory’s clients. His vast involvement in wide-ranging CV and NLP use cases enable CloudFactory to rapidly apply holistic solutions to growing industry demands.

Maria Greicer

VP of Partnerships, Keymakr


Maria Greicer is VP Partnerships at Keymakr, which is specializing in data collection and data creation for training of Computer Vision AI models. Maria has 13+ years of experience working within the high-tech startup industry and holds a BA in Entrepreneurial Management and Information Technology from IDC Herzliya in Israel.

Share this webinar


About CloudFactory

For over a decade, CloudFactory has powered quality data at scale. Its managed workforce processes pipelines of big data with high accuracy on virtually any platform, with the expertise and communication of a trained internal team. As a global leader in impact sourcing, CloudFactory creates economic and leadership opportunities for talented people in developing nations.


About Keymakr

Keymakr specializes in custom data collection and annotation using proprietary data mining and annotation tools that can be purpose built for your project. We can collect the raw data required for the creation of the training data sets from available sources, or we can create the data in our customized production studio set up for your specific requirements.