The web is the greatest source of publicly available data in the world, but that doesn’t necessarily mean it’s easily accessible. Aggregating data is often not applicable when the necessary information is too large in scale, or spread across multiple websites. That’s what scraping engines are for.

Whether your business is trying to centralize information regarding government regulations, TV shows, or the stock market - chances are it’s spread out over hundreds, if not thousands, of websites.

Icreon builds powerful scraping engines that easily extract information from all these data points and host it in a centralized platform. We make it extremely easy to connect disparate sources of data that are spread out across the web.

The Basics

  • Using APIs

    Businesses that operate on a large scale usually provide APIs to grant access of proprietary data. For example, it’s relatively easy to extract data from applications such as Twitter & Foursquare using their public APIs.

  • HTML Parsing

    APIs aren’t always readily available, so we build code to scrape information directly from the source which could be anything from simple product listings to more detailed building inspection regulations.

  • Adaptive

    Junk data needs to be accounted for. We develop tools that delete duplicate sets of information, so even if a website was to completely overhaul overnight we set up systems with the capacity to handle the change.

The Not So Obvious

  • Micro format Parsing

    Sometimes the data you need is highly sophisticated. Using techniques such as microdata and microformat parsing to read every website’s Document Object Model (DOM), we build systems that easily gather necessary information.

  • Machine Learning

    The web changes at such a fast rate that the data you’re looking for may exist in areas you’re not even familiar with. We use machine learning tools to crawl the entire web, giving you access to all possible relevant information.

  • Data Mining

    When a scraping engine pulls in terabytes of information on regular basis, it’s practically impossible to analyze. We build tools that easily provide data related to the information you are looking for.

our solutions in action

SiteCompli
Crimson
Nat Geo
  • API
  • web browser
  • database
Nation Geographic Website
  • 2:30PM

    New York

  • 2:30PM

    London

  • 2:30PM

    Dubai

  • 2:30PM

    delhi