HREF Link problem - Regular Expressions

Results 1 to 3 of 3

Thread: HREF Link problem - Regular Expressions

  1. #1
    Greg Moss Guest

    Default HREF Link problem - Regular Expressions

    I posted this problem yesterday but didn&#039;t really understand how complex it was. Now I&#039;m reposting the problem with the new requirements I&#039;ve uncovered.<BR><BR>My goal is to extract links from HTML pages to store for validation and cataloging. The situtation is complicated by the fact that for pages that are in the same domain don&#039;t have an http:// on them with the proper domain name. <BR><BR>For example:<BR><BR>domain is<BR><BR>so a couple of example links might be<BR><BR>&#060;a href=addlink.asp&#062;Add your link&#060;/a&#062;<BR>&#060;a href="news.asp?type=religious"&#062;Religious News&#060;/a&#062;<BR>&#060;a href=""&#062;Affiliate Programs&#060;/a&#062;<BR><BR>for these three sample links I would want to get back:<BR><BR><BR><BR><BR><BR>As you can see the local links complicate my problem. To make it more challenging I really need to know if a particular link is local to the domain or external. At times I may want to ignore one or the other when I am processing.<BR><BR>One more tricky aspect I&#039;m wrestling with is that the page of the domain I&#039;m on could be or it could be or even<BR><BR>Obviously this would change how the local pages would need to be added to the root. <BR><BR>I appreciate any help.

  2. #2
    Join Date
    Dec 1969

    Default RE: HREF Link problem - Regular Expressions

    I think all you need is to use "Left,Right or Mid" functions of vbs also to play with "If" statments..

  3. #3
    Greg Moss Guest

    Default RE: HREF Link problem - Regular Expressions

    I&#039;ve got some code I&#039;ve written using left, right, and strtran functions but it was getting super messy. There are simply too many conditions for the matching. Regular expressions are far superior for this type of application but I&#039;m very new to them. <BR><BR>Currently I&#039;m using the pattern:<BR><BR>href=[^ &#060;&#062;&#039;]+<BR><BR>to find the hrefs. <BR><BR>It works pretty well but it doesn&#039;t solve my replace problems. In fact I&#039;ve sort of forced it to work by using the left, right and strtran functions to make it work. I&#039;m sure however that somebody who really knows regular expressions could really help me out here.<BR> <BR><BR>Other patterns I found that I can&#039;t get to work properly looks like this:<BR><BR>href\s*=\s*(?:\"(?&#060;1&#062;[^\"]*)\"&#124(?&#060;1&#062;\S+))<BR><BR>It errors when I try to execute the match. This one is way too complicated for me to understand and try and fix it.<BR><BR>Here is one other pattern that I was unsuccesful in using:<BR><BR>http://(.*?)""&#062;s*((
    &#124.)+?)<BR><BR>Thanks a lot for the help<BR><BR><BR>

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts