Thread: Parsing text from retrieved HTML page

    pmisiowiec

    Parsing text from retrieved HTML page

    I installed the ServerObjects AspHTTP component on a test server. I&#039ve figured out how to pull out certain &#060;a href&#062;&#039s using the GetHREFs method and some basic instr and mid functions. <BR><BR>However, let&#039s say I want to create my own subroutine to go through and pull out href&#039s and the text immediately following it? In other words, I have the string strResult = HttpObj.GetURL but I want to go through and stick every href and the first 100 characters following the &#060;a href="whatever"&#062; into a two dimensional array. I suck at arrays and advanced parsing, so can anyone give me some suggestions?<BR><BR>Thanks,<BR><BR>Philip<BR><BR><BR>

    Stephen Fisher

    RE: Parsing text from retrieved HTML page

    I assume that the string you get back is the whole page (call it strHTML).<BR><BR>Once you have an URL (call it strURL), do a InStr(strHTML, strURL) to get the position of that URL. <BR><BR>Then trim off all characters in front of the strURL and only take the next 100.<BR><BR>You will need to use<BR>Left()<BR>Right()<BR>Mid()<BR>Len()<BR><BR> Something Like<BR>Left((strHTML), InStr(strHTML, strURL))<BR><BR>When parsing strings, I always use response.write to write out everything while debugging.<BR><BR>Hope this helps<BR><BR>

