How to extract data from a website

Web pages are designed for presenting information and they do this very well, however, if you have ever tried to scrape (extract) data from them it can be quite a challenge. Sometimes the only way to do complex web scraping accross multiple pages is to employ a programmer and this can be expensive. WebGet solves this problem by providing a simple interface to get access to the data you want. Simply record a task (macro) and schedule it! This can be done in a matter of minutes. Why not sign up for a free account and give it a try?

In this tutorial we will step you through some of the features of WebGet and show you how to create your first recording. If you would prefer to watch a video please see the introduction video above.

To use WebGet you will need to sign up for a free account. After doing this and logging in, click on the Record New Task button to start extracting data. Record new task button The first step in the recording is to specify the address of the web page you want to extract data from. If you just want to extract data from a single page you should specify the url of that page, e.g. https://www.ebay.com/b/Electronics/bn_7000259124 If you want to exact data from multiple pages, specify the starting page, e.g. www.ebay.com

In this example we are going to extract data from a Google search. So, enter www.google.com for your Starting page and click Start. Start scraping data from a website After the page has loaded, click in the search bar and type a search. Let's search for a dentist in Manhattan: Enter Google search text Next, click on the Goolge Search button and select Click from the popup menu: Click Google search button The Click action will display in the panel to the right. This allows us to confirm and clarify exactly what we are doing. As everything is correct we simply press the click button: Click action You will notice that as we are performing actions, WebGet will be saving the steps in the top left of the recording page. If you need to modify the steps later you can click on them there. Recording steps Now we are going to loop over all the listings and extract some data. The loop feature in WebGet makes it extremely powerful. Scroll down past the ads to locate the first listing. Click on the name and select Loop Over Similar Items: Loop over Google listings The items that will be included in the loop operation will be highlighted in blue and you will notice a counter in the top right hand corner of each listing name. Now we want to modify what we are looping over. Instead of looping over just the listing names we want to loop over all the information in each listing. We can do this by changing the Target Item in the Loop action panel. Click on the forward arrow and you will see the highlighted items in the web page change. Change loop target item Click the forward button a few more times until all the data in each listing is highlighted like below: Loop selection Now we give the action a name. For loop actions we name it by describing what data we are looping over. Type the name "Listing" and click the Loop button. Loop action name Now you will notice that WebGet has blanked out a large portion of the screen. This is because we are inside a loop and we can only access items within the loop container. All actions from now on will be run on all our loop items.

The next step is to extract some information. Click on the listing name and select Extract: Extract Google listing name You will notice that WebGet will show you an example of the data that will be extracted. If this is all correct we simply enter a name for the action and click Extract. Extract Google listing name action Now lets extract the url. Click on the url and select Extract: Extract Google listing url Enter "Url" for the Name and click Extract: Extract Google listing url action Now click on the listing description and select Extract: Extract Google listing description Enter "Description" for the name and and select Extract: Extract Google listing description action We've now finished with the loop so go to the top of the recording and click the End Loop button: End loop button At the moment we are only extracting data from the first page of the Google results. Lets loop over multiple pages. Scroll to the bottom of the Google page and click the Next button. Select "Click Next Page" from the popup menu. This will tell WebGet that the data on this page is part of a series of pages and we want to move to the next page. Loop over all Google listing pages Enter the Name, set the Number of Pages to 5 and press Click: Next page action That's it for the recording. Click on the Finish & Save button: Finish & Save button Give the task a name and click Save & Run: Save extract data from website task The task will run and WebGet will gather the data. When it is finished you will have all the search results listed. Data extracted from webpage This data can then be Exported to Excel by clicking the Download Excel button, or it can be run periodically by clicking the Schdule button.