How to extract data from a websiteWeb pages are designed for presenting information and they do this very well, however, if you have ever tried to scrape (extract) data from them it can be quite a challenge. Sometimes the only way to do complex web scraping accross multiple pages is to employ a programmer and this can be expensive. WebGet solves this problem by providing a simple interface to get access to the data you want. Simply record a task (macro) and schedule it! This can be done in a matter of minutes. Why not sign up for a free account and give it a try?
In this tutorial we will step you through some of the features of WebGet and show you how to create your first recording. If you would prefer to watch a video please see the introduction video above.
To use WebGet you will need to sign up for a free account. After doing this and logging in, click on the Record New Task button to start extracting data. The first step in the recording is to specify the address of the web page you want to extract data from. If you just want to extract data from a single page you should specify the url of that page, e.g. https://www.ebay.com/b/Electronics/bn_7000259124 If you want to exact data from multiple pages, specify the starting page, e.g. www.ebay.com
In this example we are going to extract data from a Google search. So, enter www.google.com for your Starting page and click Start. After the page has loaded, click in the search bar and type a search. Let's search for a dentist in Manhattan: Next, click on the Goolge Search button and select Click from the popup menu: The Click action will display in the panel to the right. This allows us to confirm and clarify exactly what we are doing. As everything is correct we simply press the click button: You will notice that as we are performing actions, WebGet will be saving the steps in the top left of the recording page. If you need to modify the steps later you can click on them there. Now we are going to loop over all the listings and extract some data. The loop feature in WebGet makes it extremely powerful. Scroll down past the ads to locate the first listing. Click on the name and select Loop Over Similar Items: The items that will be included in the loop operation will be highlighted in blue and you will notice a counter in the top right hand corner of each listing name. Now we want to modify what we are looping over. Instead of looping over just the listing names we want to loop over all the information in each listing. We can do this by changing the Target Item in the Loop action panel. Click on the forward arrow and you will see the highlighted items in the web page change. Click the forward button a few more times until all the data in each listing is highlighted like below: Now we give the action a name. For loop actions we name it by describing what data we are looping over. Type the name "Listing" and click the Loop button. Now you will notice that WebGet has blanked out a large portion of the screen. This is because we are inside a loop and we can only access items within the loop container. All actions from now on will be run on all our loop items.
The next step is to extract some information. Click on the listing name and select Extract: You will notice that WebGet will show you an example of the data that will be extracted. If this is all correct we simply enter a name for the action and click Extract. Now lets extract the url. Click on the url and select Extract: Enter "Url" for the Name and click Extract: Now click on the listing description and select Extract: Enter "Description" for the name and and select Extract: We've now finished with the loop so go to the top of the recording and click the End Loop button: At the moment we are only extracting data from the first page of the Google results. Lets loop over multiple pages. Scroll to the bottom of the Google page and click the Next button. Select "Click Next Page" from the popup menu. This will tell WebGet that the data on this page is part of a series of pages and we want to move to the next page. Enter the Name, set the Number of Pages to 5 and press Click: That's it for the recording. Click on the Finish & Save button: Give the task a name and click Save & Run: The task will run and WebGet will gather the data. When it is finished you will have all the search results listed. This data can then be Exported to Excel by clicking the Download Excel button, or it can be run periodically by clicking the Schdule button.