Digital Colony!

Using ASPHttp To Scrape Data in Classic ASP

The article was written in 2001.

I recently discovered a cool ASP component from Server Objects called AspHTTP. This component allows an ASP page the ability to GET documents via the HTTP protocol. It also can POST data to a remote web page. Why would you need this for an ASP page? The ability to parse data off a web page and place it in your own format is one need. The AspHttp component lets you to pull the remote web page into your code as a string. From there you can use extensive string parsing to extract the data. Copyright Issues

Copyrights are outside the scope of this article. Just be aware that snagging someone else's data may make their legal departments unhappy. If you choose to use someone else's data, getting their permission may be a wise decision. Use this tool for good, not for evil.

Sample Code

<%
Set HttpObj = Server.CreateObject("AspHTTP.Conn") 
HTTPObj.Url = ' enter some URL here 
' fetch the HTML page into the strResult variable 
strResult = HTTPObj.GetURL 
' check for component error 
If Len(HTTPObj.Error) Then 
   Response.Write "ERROR: " & HTTPObj.Error 
Else ' did we retrieve a document? 
  intLength = Len(strResult) 
  If intLength > 0 Then 
    ' proceed with scrape 
  End If 
End If 
%>

Last Words

Data scraping can be an inexact science. If the web site changes their markup, you can find yourself recoding your scrape. Look for an API first before proceeding with a data scrape.

Labels: , ,

AddThis Social Bookmark Button

0 Comments:

 

Post a Comment

 

Digital Colony Copyright © 1999-2008 XHTML   508
This site uses Blogger, which is not 100% XHTML compliant.
Try...Catch Disclaimer: For brevity many examples do not include error handling. That is your responsibility.