What is Web Crawling

Web Crawling is an automated technique of navigating around a web site. Generally The Web Crawler will work with a Data Harvester, Data Extractor, Screen Scraper.

The Web Crawler will be responsible for automatically navigating a web site. The crawler will follow every link in a methodical way hunting for data to be scraped.

When the crawler (sometimes know has Web Spiders or Web Bots) finds a page that the screen scraper is interested in (such as a product page) the crawler then calls upon the Screen Scraper to extract the information that is required.

The extracted data that the screen scraper found is then cleansed, processed, transformed, translated as required and then stored in another place such as a spread sheet or database. Once the data is stored in Excel, CSV (Comma Separated Values) or a Database, it makes life much easier to use the data.

Extract Automatically – no more copy and paste.

iHarvest can save you hours and hours of manual effort.

If you have a Screen Scraping project / idea? Contact iHarvest today, we’ll happily discuss your idea and take a look at the web site you want to extract data from. Initially we’ll help you establish how scrape-able the data is, again, its 100% no obligation.

Web Crawling

Save 100’s of hours manually inputting

Why use Web Crawling or Screen Scraping?

  • Extract data and images from a web site very quickly.
  • Extract data from a legacy web site.
  • Analyses a competitors site. “Measure” their product range.
  • Identify competitors items In Stock and Out of Stock.
  • Identify a competitors brand proportion, how much of one band to they sell, what product types.
  • Identify if a competitor is selling products you are not, and vice-versa.
  • Extract data and images from a web site very accurately.
  • Compile the extracted information into a database, spread sheet to draw further analysis.
  • Screen Scraping data helps with data mining and business intelligence.

More References

http://en.wikipedia.org/wiki/Distributed_web_crawling

http://en.wikipedia.org/wiki/Web_crawler