In the current era of technological advances, there are still tasks which computers are unable to accomplish despite improvements in Artificial Intelligence and Optical Character Recognition. Despite the claims of code-enabled data crawlers, there are still many data online which need to be harvested by real humans, as the way information is displayed is never congruent, and minute differences could cause “machine errors”.
A researcher from a top university in the US was having issues with collecting such information online. She was able to accomplish some level of successes with her code-enabled crawlers, but the same crawlers failed for different files. We came in, worked with her to understand the requirements and saved her months to years of work.
The project involved examining thousands of text files, understanding and extracting the relevant information the client wanted as the code-enabled crawlers were unable to recognize the data format. The reason for the break is due to huge variances in the way information and data are represented on the target data sources.
We worked closely with 15 of our full-time remote staff to tackle the list of files systematically, with robust checks both machine and human put in place at each stage of the work done to ensure quality, and consistency in work completed.
Our work resulted in hundreds to thousands of intense and focused man hours saved by our client partner. This degree of work would not have been realistic for her and her team to complete and would have resulted in significant productivity losses (6 months to a year of mundane data examination and cleaning work).
As the academia moves towards cross-industry collaboration, it is increasingly important for researchers to utilize leading tools or services to help them become more productive. At Techsumption, we believe that researchers should focus on the derivation of insights and analysis of data, as years spent getting data needed for research is a massive drain on human capital and potential.