| Back in the 1980's, a hot topic was "screen
scraping" -- quick-and-dirty applications that pretended to behave like 3270 or vt100
terminals to legacy systems. These applications were not reliable, but they were cheap.
They provided lots of bang-for-the-buck, as compared to doing it the proper way. With
web farming, we are in a similar situation with web scrapers, applications that process
the HTML of a web page to extract meaningful data. Again, web scrapers are not reliable,
but they are cheap and useful.
Let's examine a simple application that I have been operating for several months. It is
written in Microsoft Visual Basic version 6. The application retrieves a web page from
Amazon.com that contains a description of my Web Farming book. Within that description, it
extracts the sales rank of the book and writes that value to a file.

The calculation of the sales rank is based on
Amazon.com sales and is updated regularly. The top 10,000 best sellers are updated each
hour to reflect sales over the previous 24 hours. The next 100,000 are updated once a day.
The rest of the list is updated monthly, based on various factors. The lower the number,
the higher the sales for that particular title. When compared with more than two million
books, this ranking is the closest indicator to a stock price for a book!
If you examine the HTML source for this page, you will find the following fragment amid
lots of garbage:
<font face=verdana,arial,helvetica size=-1>
<b>Amazon.com Sales Rank: </b>
4,518
</font><br>
The task is to retrieve the HTML as a text
string, search for an initial pre-pattern, search for a post-pattern, extract the text
between the two patterns, convert the text to a numeric value, and write it to a file.
Sounds simple? Actually, it is! Here are the steps:
- Open Visual Studio for VB and create a new standard EXE.
- Under the Project menu, click on Components to open the Components dialog box. Check the
box for Microsoft Internet Transfer Control 6 (which will load MSINET.OCX). Be sure to
check the box, and then click OK. This is the secret ingredient!
- You should see on your tool bar at the left a new icon
(a world with a terminal in front).
Click on this icon. Then, drag a square on the form.
- Also add a label with caption 'Ranking'. Add a textbox and a command button with caption
'Probe for Amazon Sales Ranking'. Your form should look like this. Not pretty, but simple.

- Now for the fun stuff! Double click on the button to open the code window. For the
routine Command1_Click, copy and paste the following. Watch for extra line breaks.
Private Sub Command1_Click()
Dim strPage, strISBN, strURL As String
On Error Resume Next
' set the proper URL to Amazon.Com asking for specific book
strISBN = "1558605037" ' ISBN for the WFbook
strURL = "http://www.amazon.com/exec/obidos/ASIN/" _
& strISBN & "/"
' get the webpage content using Inet control
strPage = Inet1.OpenURL(strURL, icString)
' put the ranking value into the textbox
text1.Text = GetRank(strPage, "Sales Rank: </b>",
"</font>")
End Sub
All the work is performed in the INET control
with the OpenURL method. If there is an error, the string strPage remains empty because of
the Resume Next.
- One more piece of code is required. The GetRank routine must parse all of that messy
HTML and extract the sales ranking. Note that the arguments are the HTML buffer, the
pattern prior to the ranking, and the pattern after the ranking. Here is the code to copy
and paste after the previous routine. Trust me; it works!
Private Function GetRank(strPage, strPrePat, strPostPat As String) As String
Dim iStart, iEnd As Integer
Dim strIn, strOut As String
GetRank = ""
iStart = InStr(1, strPage, strPrePat) ' find first pattern
If iStart <> 0 Then
iStart = iStart + Len(strPrePat)
iEnd = InStr(iStart, strPage, strPostPat) ' second
If iEnd <> 0 Then
strIn = Mid(strPage, iStart, iEnd - iStart)
strOut = ""
For iStart = 1 To Len(strIn) ' strip out control chars
If Mid(strIn, iStart, 1) < " "
Then
strOut = strOut & " "
' add a blank instead
Else
strOut = strOut & Mid(strIn,
iStart, 1)
End If
Next iStart
GetRank = Trim(strOut) ' return extracted value
End If
End If
End Function
- Now save and run. Be sure that your connection to
the Internet is active. After clicking the button, there will be a pause and a number
should appear in the text box. Hopefully, the number will be a low one for such an
excellent book!
In the version that I use, I added a timer so
that every hour the Amazon web page will be probed. Whenever the value changes, it is
written to a comma-delimited text file for import into Excel for charting. So far, I have
reliably recorded the sale ranking every hour for over four months.
I would like to hear about your experiences and enhancements.
- Richard Hackathorn
dick@webfarming.com |