The article was written in 2001.
I recently discovered a cool ASP component from Server Objects called
AspHTTP. This component allows an ASP page the ability to GET documents via the HTTP protocol. It also can POST data to a remote web page. Why would you need this for an ASP page? The ability to parse data off a web page and place it in your own format is one need. The AspHttp component lets you to pull the remote web page into your code as a string. From there you can use extensive string parsing to extract the data.
Copyright Issues
Copyrights are outside the scope of this article. Just be aware that snagging someone else's data may make their legal departments unhappy. If you choose to use someone else's data, getting their permission may be a wise decision. Use this tool for good, not for evil.
Sample Code
<%
Set HttpObj = Server.CreateObject("AspHTTP.Conn")
HTTPObj.Url = ' enter some URL here
' fetch the HTML page into the strResult variable
strResult = HTTPObj.GetURL
' check for component error
If Len(HTTPObj.Error) Then
Response.Write "ERROR: " & HTTPObj.Error
Else ' did we retrieve a document?
intLength = Len(strResult)
If intLength > 0 Then
' proceed with scrape
End If
End If
%>
Last Words
Data scraping can be an inexact science. If the web site changes their markup, you can find yourself recoding your scrape. Look for an API first before proceeding with a data scrape.
Labels: ASP, COM, VBscript