2012 - 2015
Collection and provision of data from up to 1 billion websites per month.

Description of Tasks

Crawling bulk data for a US company.

Construction, installation and expansion of the crawler software, execution of jobs, tuning of the cluster (Hadoop cluster, Apache Nutch, Java, special export application)

Eingesetzte Technologien