Regex Negative Lookahead, so powerful!


Regex Negative Lookahead, so powerful!

Posted by Luis Majano
May 11, 2010 12:21:39 UTC

I am a big fan of regular expressions.  They are so mysterious and elegant, I love them and hate them, cherish them and despise them.  In other words they are both beautiful and nasty, but they can really get the job done.  I was intrigued today when a colleague of mine was asking how to match a string but if it started with a specific sequence then do not match.  In my regex coolness I said “Sure, that’s easy man!”, whipped out QuickREx in Eclipse, wrote down a quick jolt of brilliance and BAMM!! I hit the brick wall at 90! Nothing I was trying would work, characters where matched not the word:

   1: ^([^search].*)dev\.domain

Basically match any incoming domain that ends with dev.domain but it should not be preceded by the word search.  I really thought I nailed it with that regex, but NOOO, it matches character classes not full words.  So I had to revert my humble ego and go back to the books and voila: Positive and Negative LookAhead!

Granted, some regex engines do not support look behinds, but thankfully java does.  Here is a cool definition:

(?!regex)


Zero-width negative lookahead. Identical to positive lookahead, except that the overall match will only succeed if the regex inside the lookahead fails to match.

And finally and AHA!! moment.  I can use the lookahead:

   1: ^(?!search|training).*dev\.domain

AND BUYAAA!!!  I think this can help somebody out there. It helped me!


Michiel

Great post with a perfect example. I'm pretty sure everyone hates and loves RegEx', as long as they know how to use them (up to a point where it just becomes scary). I do plan to study the wonderful world of RegEx' more but just lack the time.

Reading articles like this can always put a smile on my face, thanks! :)

Peter Boughton

Granted, some regex engines do not support look behinds, but thankfully java does. <<

Just to avoid potential confusion, what you used is a lookahead, not a lookbehind, and supported by most engines - including java, cf (apache oro), js, and others.

Lookbehinds (both positive and negative) are less widely supported (not available in cf or js), and look like (?<this) and (?<!this) for positive and negative respectively.

Amy

>> Granted, some regex engines do not support look behinds, but thankfully java does. <<

Just to avoid potential confusion, what you used is a lookahead, not a lookbehind, and supported by most engines - including java, cf (apache oro), js, and others.

Lookbehinds (both positive and negative) are less widely supported (not available in cf or js), and look like (?<this) and (?<!this) for positive and negative respectively.

Sean Schricker

Hey. I was going to email you directly, but you don't have any contact info easily findable on your site.

The grey shadow lowers the contrast of your code from its background, making it hard to read, so please override the "body{text-shadow: 0 0 4px #444444}" of your theme with something like this: pre{text-shadow:none;}

Oh, and I only have love for regular expressions. A well-written one is always beautiful in its own way!