Can you extract text from a web page such as product name, description, price ?

Yes, any visible text on a web page can be extracted, we can also extract other attributes such as weight, size, qty in stock etc. Collectively this know as Web Crawling, Web Data Extraction, Web Harvesting, Web Data Mining, Screen Scraping

Can you extract images from web pages ?

Yes, images can be downloaded, they are provided to you in a zip file. Each image file name is stored against a product row so you can easily match the image to the product text.

Can you download and extract data from PDF’s ?

Yes, our Crawler can identify all PDF links on a web page. The crawler will then download them automatically just like images so you have an offline copy of the PDF’s. Extracting data from PDF’s is a different technique from extracting(scraping) data from a HTML page. In the past we have used different tools to then open up each PDF and extract the desired information. Which tool we use or recommend really depends how the PDF is structured.

How often can you get the data ?

As often as required, one offs, daily, weekly, let us know your requirements.

Can you auto fill login details ?

Yes, we can auto fill, login names, passwords, fill in dates, submit text, select from drop down lists. We can automate anything that a human can do manually

Can you auto fill CAPTCHA text boxes ?

This is tricky. if you are looking at scraping a site with a CAPTCHA input box, contact our team and we’ll see if we can assist. Ultimately there might be some human input here.

What format will I get the data in?

Generally we provide the data in CSV, XLS,  XML or SQL Insert Statements. However, if you have custom requirement please contact our team and we’ll see if we can assist.

Can we capture the bread crumb ?

Yes, this is useful to categorise the data in to sections and sub sections.

What is a web sites bread crumb ?

Breadcrumbs typically appear horizontally across the top of a web page, usually below title bars or headers. They provide links back to each previous page the user navigated through to get to the current page or—in hierarchical site structures—the parent pages of the current one. Breadcrumbs provide a trail for the user to follow back to the starting or entry point. A greater-than sign (>) often serves as hierarchy separator, although designers may use other glyphs (such as » or ›), as well as various graphical treatments.

Typical breadcrumbs look like this:

Home page > Section page > Subsection page
Home page >> Section page >> Subsection page

for more info see http://en.wikipedia.org/wiki/Breadcrumb_(navigation)

Can you provide a screen shot of the web page ?

Yes, can can provide a “WebShot” of a page. The web shot will “auto-scoll” to ensure the entire page is captured from top to bottom.

