Our first web screen scraping service project was back in Oct 2003. The market for web automation and data extraction did not really exist at this time, this market was new and only just emerging !
Screen Scraping Service
In 2003 Screen Scraping and Web Data Extraction “as a service” was unheard of ! These days however, doing a quick search on google you will find numerous web sites offering screen scraping services and scraping tools.
Try searching google for any of the following :-
- content scraper
- screen scraping service
- scraper software
- website scraper software
- data scraper software
- scrape data from website
- scraping data from websites
- how to scrape websites
In 2003 (after a search on google) we found 2 interesting tools, wGet and iOpus Internet Macros. These tools were some of the few utilities around that had the ability to Crawl a Web Site and then effectively Screen Scrape Data.
wGet featured in the Hollywood movie Social Network. Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook. In the movie Mark Zuckerberg is seen using wGet to scrape data to populate content on his web site then known as “TheFaceBook.com
wGet is still in use today. I noticed recently that “WSUS Offline” use wGet to download Microsoft updates. If you ever experience problems updating a Microsoft operating system via the built-in “Windows Update” tool, you need to visit http://download.wsusoffline.net ! This tool can be used online or offline to update windows machines.
It’s good to see wGet is still being used 9 years on.
iOpus Internet Macros
iOpus Internet Macros has the ability to record web browser input. Once the scripts are written they can be replayed over and over doing hours a repetitious work. iOpus Internet Macros programmatically interacts with websites. It fills out forms and automates the download and upload of text, images, files and web pages. It can import or export data to and from web applications using CSV & XML files, databases, or any other source.