Odd behavior -- any explaining?

Results 1 to 4 of 4

Thread: Odd behavior -- any explaining?

  1. #1
    SPG Guest

    Default Odd behavior -- any explaining?

    I&#039m just wondering if this is another example of half-baked regular expressions or if this really is the way it&#039s supposed to be.<BR><BR>I have an HTTP Header stream which looks a bit like<BR><BR>...<BR>Content-Disposition: form-data; name="field"; filename="C:foo.txt"<BR>Content-Type: text/plain<BR><BR>This is the foo.txt file&#039s contents<BR>...<BR><BR>Now, shouldn&#039t a non-global, multiline, case-ignorant swing of ".*?Content-Type: (.*?)/plain.*" to replace with $1 remove -Everything- except the "text"? In reality, it removes everything on the same line as "text" (except "text"), yet leaves the rest of the content obnoxiously untouched.<BR><BR>Anybody know why or have a workaround? It doesn&#039t absolutely have to be a Replace, it just has to be a more efficient means of getting $1 than stepping through the header without regular expressions.

  2. #2
    SPG Guest

    Default Found on Microsoft... .*? doesn't work

    Evidently, Microsoft introduced a new "feature" into regular expressions, whereby they&#039re always greedy (.*? doesn&#039t work). Note that MS didn&#039t say this -- it was a user post -- but it explains a lot.

  3. #3
    Join Date
    Dec 1969

    Default RE: Found on Microsoft... .*? doesn't work

    Can you share a URL that explains this? Thanks!

  4. #4
    SPG Guest

    Default The Greedy * (hopefully) explained; URL included

    In _Learning Perl_ (www.ora.com), they discuss regular expressions is a little bit of detail. .* matches any character (.) any number of times (*). Now, how do you know when you&#039re done? That&#039s the greedy part -- the * takes everything.<BR><BR>When you&#039re searching for a single character, you can exclude that character from your search as like this:<BR><BR>[^{character}]*{character}<BR><BR>as seen in the Tag Stripping syntax. But if you&#039re looking for "Foo", this clearly won&#039t work since "Foo" is a complete word to match. Hence, we have to somehow stop an all-character match, which is made possible by .*? (literally, "Any character, any number of times, maybe.")<BR><BR>A pattern of "foo.*?bar" should/will match both "foobar" and "foo dinner was lovely, wasn&#039t it? bar"<BR><BR>However! According to somebody&#039s notes -- which could easily explain my above-stated confusion, though I haven&#039t specifically tested for greed since finding this -- at http://msdn.microsoft.com/workshop/languages/clinic/scripting121399.asp, Microsoft&#039s Regular Expressions are always greedy. This means that it will give the "*" rather than the "bar" the benefit of the doubt when matching.<BR><BR>In short, the * is too high in the order of operations.<BR>

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts