URL Parsing

Results 1 to 2 of 2

Thread: URL Parsing

  1. #1
    Join Date
    Dec 1969

    Default URL Parsing

    I&#039;ve a problem with parsing url&#039;s from a string.<BR><BR>The code is like this:<BR><BR>strURL = "www.yahoo.com"<BR><BR>Set objReg = new RegExp<BR>objReg.IgnoreCase = true<BR>objReg.Global = true<BR>objReg.Pattern="(www.)([^(&#060;&#124 )]+)"<BR>strURL = objReg.Replace(strURL ,"$2")<BR>Set objReg = Nothing<BR><BR>This RegExp works great if the strURL is like "www.yahoo.com". But if there is any html tags after the url, the script doesn&#039;t work. "www.yahoo.com<BR>" returns "yahoo.com<BR>" or "www.yahoo.com <BR>" return "yahoo.com <BR>"<BR><BR>How can I prevent this tags and whitespaces parsed along with th url?

  2. #2
    Join Date
    Dec 1969

    Default RE: URL Parsing

    I&#039;ve come up with the following pattern:<BR><BR>(\w&#124-)+(\.(\w&#124-)+)*\.\w{2,3}(/.*)?<BR><BR>It&#039;s not perfect, so beware. The problem is that a hyphen is permitted in a URL name, but not as the first or last character. However, it seems that multiple hyphens are between words and numbers(according to register.com).<BR><BR>If you wish to not allow hyphens then use:<BR><BR>(\w)+(\.(\w)+)*\.\w{2,3}(/.*)?<BR><BR>-----------------------------------<BR><BR>-hello.com -&#062; NO<BR>hello-.com -&#062; NO<BR>hello.-1-.world.com -&#062; NO<BR>hello-world.com -&#062; YES<BR>hello--world.com -&#062; YES<BR>hello-1--2---3-world.com -&#062; YES<BR><BR>

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts