Zyte, a leader in web data extraction technology, turned to CloudFactory to speed up development of a new web scraping tool. The CloudFactory team conducted manual annotation of key information on URLs and subdomains to create a training data set for the new tool.
Zyte, a world-leading web scraping company, required resources to speed up the development of one of their pioneering web scraping tools.
Zyte Automatic Extraction lets their customers convert whole websites or specific pages to datasets in just a few clicks. This powerful AI/machine learning model ‘sees’ a web page much as a human does, with a screenshot of the entire page and its visual elements as they’re rendered by an actual browser. These inputs are combined with the source code and processed by a deep neural network, allowing the model to understand relationships between different elements on a page. Trained on a wide range of different domains, the model also understands the inner structure of a page much as a web developer would.
Thankfully, demand did not slow down in the wake of the COVID-19 pandemic. Zyte expanded its offerings and acquired new clients even in uncertain economic times that disrupted work around the world. They were able to do this in part due to a partnership with CloudFactory.
Getting to market quickly with the new product was key. In order to do this, Zyte needed a lot of annotations—more than they could source themselves within the desired timeframe.
The Zyte team first considered the obvious solution, to hire more employees, but decided that outsourcing some of the work involved was a better solution. Zyte’s hypothesis was that a dedicated team from a company like CloudFactory would require much less training and essentially hit the ground running while at the same time being cost effective.
After weighing options, Zyte decided the most beneficial option would be to partner with a company that offered a specialized, managed workforce. “As we had a very small internal annotations team, we were eager to form a partnership with experienced and capable experts who could assist Zyte in our plans to scale the work,” says Shaunie O'Halloran, annotation specialist with Zyte.
“Our needs accelerated fast, and we wanted to release more capabilities and offer different types of data,” says O’Halloran. “It wasn't possible for us to do it all ourselves. We knew CloudFactory’s team was experienced, reliable, and wouldn't need a lot of training. We could give them a task and they’d be able to run it themselves with minimal oversight.”
We hit our product release deadline, and that’s in part thanks to the work CloudFactory’s team has put in over the last year.
CloudFactory offered the company a managed workforce solution with a select team of specialists. These data analysts began with manual annotation of key information on URLs and subdomains. Zyte would send sample domains to the CloudFactory team, and the team learned from those experiences and began sourcing their own pages.
“The more CloudFactory’s team did, the more they learned, and the more efficiently and intuitively the team worked,” says O’Halloran. The partnership worked so well that the company added additional work for data analysts, including data acquisition and quality control, which allowed the Zyte team to focus on other ways to scale the business.
CloudFactory aimed to add value at every opportunity. In addition to meeting throughput and quality requirements, the team offered Zyte feedback that influenced improvements in work process and tools. When Zyte wanted more visibility into annotation times, CloudFactory’s project manager set up time tracking software to inform Zyte for release planning.
“To be honest,” says O’Halloran, “with the data that CloudFactory has given us, we've been able to do more with reporting. We can see which capabilities take the most time and which ones are the most problematic, so we can predict how long it will take us to develop a new capability in our solutions.”
CloudFactory expanded Zyte’s labeling volume by five times, which gave it more annotated data to train machine learning models and accelerated the speed of its product releases. Our work also freed up 60% of O’Halloran’s time, which was reinvested in growth planning and product development.
“Working with CloudFactory has definitely freed up a lot of time for all of us and made me have to worry less about the quality of annotations. Now we can get a lot more done and move a lot faster in progressing with our public releases. CloudFactory created an experience that fit our company’s business model.”
We have 10+ years of experience helping our clients focus on what matters most. See what we can do to help your business.
Big data is transforming and improving the art and science of marketing in the age of ecommerce.
As customer expectations evolve, companies are turning to AI to make the retail experience more convenient and customized.