The "not" operator

Results 1 to 3 of 3

Thread: The "not" operator

  1. #1
    David Bourque Guest

    Default The "not" operator

    Hi, i would like to write a regexp that would select all the sentences that contains the word "apple" without containing the word "banana"<BR>ex: "I am eating an apple and a banana" &#060;-- FALSE<BR> "I am eating an apple and a cake" &#060;-- TRUE<BR><BR>.*apple.*[^"banana"]* does not work (you can&#039;t put a string in<BR>the []...only separate chrs...)

  2. #2
    Join Date
    Dec 1969

    Default Here's a couple of ways!

    I&#039;ll JavaScript this out since I don&#039;t use VBScript, but the translation should be easy. The variable "str" contains the input string that needs to be searched.<BR> <BR>One way is first to extract all the sentences:<BR><BR>var regX = /[A-Z][^.!?]*[.!?]/g;<BR>var ar = str.match(regX);<BR><BR>You may need more complex sentence delimiters depending on the input you want to parse, but this regex maybe good enough.<BR><BR>Then just check which sentences have "apple" in them but not "banana":<BR><BR>for (var i=0; i&#060;ar.length; i++) {<BR> if (/[aA]pple/.test(ar[i]) && !/[bB]anana/.test(ar[i])) alert(ar[i]);<BR>}<BR><BR>The second way is entirely with a regular expression but the one I&#039;ll construct has some limitations in that it won&#039;t capture sentences that begin with "Apple" or exclude those that begin with "Banana", but you could modify it to do that if required.<BR><BR>For a "not" operator use negative lookahead if that&#039;s available in your scripting engine. Here the regex that uses positive lookahead to for "apple" and neagtive lookahead for "banana":<BR><BR>var regX = /[A-Z](?=.*apple)(?!.*banana)[^.!?]*[.!?]/g;<BR><BR>This may not be the most efficient regex for this but even if there are better ones, the first approach may work better anyway. You&#039;ll have to test things empirically on probable string inputs to find the fastest method if that&#039;s a concern. <BR><BR>Hope that helps.<BR>

  3. #3
    Join Date
    Dec 1969

    Default Some Improvements.

    In my previous post, the second "regex only" method has a drawback in that it consumes the first character at the start of sentences. So a sentence like "Banana, pear and lemon." will not be filtered out. Wrapping a positive lookahead around the starting delimiting capital letters will avoid this problem. Also, I&#039;ve changed some dot-star&#039;s to a negated character class for both efficiency and use with multiple line input strings.<BR><BR>var regX = /(?=[A-Z])(?=[^.!?]*[aA]pple)(?![^.!?]*[bB]anana)[^.!?]*[.!?]/g;<BR><BR>An alternative uses the pattern (?:(?![bB]anana)[^.!?])* for the negation of [bB]anana.<BR><BR>var regX = /(?=[A-Z])(?=[^.!?]*[aA]pple)(?:(?![bB]anana)[^.!?])*[.!?]/g;<BR><BR>Both work but I don&#039;t know which is quicker.<BR>

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts