Whenever people draw comparisons between Adblock Plus for Firefox and Opera’s built-in content blocker, they point out Opera’s lack of support for regular expressions as a deficiency. This isn’t the case at all. While regular expressions are powerful tools that give great flexibility in defining filters, they should not be used in most cases in Adblock Plus. Tips that apply to the older Adblock project, such as using regular expressions to improve speed and cramming rules into as few filters as possible, do not apply to the newer Adblock Plus project. In fact, those tips are more harmful than they are helpful.

Reasons to avoid regular expressions

There are two reasons to avoid regular expression filters. One, they are slower than simple filters. Historically, this hasn’t been the case. The older Adblock used a trivial algorithm for filter lookups, so having a long list of filters would adversely affect performance. In Adblock Plus, a lot of work went into optimizing filter-matching in a large list of simple filters. Jump tables are used to improve filter lookup speeds. For regular expression filters, jump tables aren’t used because requests are matched against every regular expression filter. Wladimir Palant, the author of Adblock Plus, has an excellent blog post that explains the details with graphs.

Two, the expressive nature of regular expressions easily leads to complex, unmaintainable filters. Because the older Adblock encouraged cramming as many rules as possible into a single filter, filters were long and unreadable. Here’s an example of a particularly gnarly filter from Filterset.G:

/[^a-z\d=+%@](?!\d{5,})(\w*\d+x\d)?\d*(show)?(\w{3,}%20|alligator|avs|barter|blog|box|central|context|crystal|d?html|exchange|external|forum|front|fuse|gen|get|house|hover|http|i?frame|inline|instant|live|main|mspace|net|partner|php|popin|primary|provider|realtext|redir\W.*\W|rotated?|secure|side|smart|sponsor|story|text|view|web)?_?ads?(v?((ition|meta|tology3|versal)\.com|(marketplace|rom)\.net|action\.se|bot|brite|broker|bureau|butler|cent(er|ric)|click|client|content|coun(cil|t(er)?)|creative|cycle|data(id)?|engage|entry|er(tis\w+|t(pro)?|ve?r?)|farm|feelgood|force|form|frame(generator)?|gardener|gen|gif|groupid|head|ima?ge?|index|info|js|juggler|layer|legend|link|log|man(ager)?|max|mentor(serve)?|mosaic|net|new||optimi[sz]er|parser|peeps|pic|player|po(ol|pup|sition)|proof|q\.nextag|re(dire?c?t?|mote|volver)|rotator|sale|script|search|sdk|sfac|size|so(lution|nar|urce)|stream|space|srv|stat.*\.asp|sys|(tag)?track|trix|type|view|vt|x\.nu|zone))?s?\d*(status)?\d*(?!\.org)[\W_](?!\w+\.(ac\.|edu)|astra|aware|adurl=|block|login|nl/|sears/|.*(&sbc|\.(wmv|rm)))/

A person can’t tell what that filter is specifically doing with a quick glance. Perhaps more importantly, it makes debugging false positives an absolute nightmare. Of course it’s possible to use clean, sane regular expressions, but their expressive power gives you more rope to hang yourself with. A single regular expression can almost always be replaced with multiple simple filters. In addition to gaining speed, pairing a single rule with a single filter gives the user the benefit of fine-grained control of disabling rules with the green dot.

The real world

There might be some people who say that the need for regular expression filters, no matter how few in number, is inevitable. The success and effectiveness of EasyList, which doesn’t use regular expressions at all, shows this isn’t the case. Fanboy’s AdBlock List shows that lists compatible with Opera and Adblock Plus are possible.

For further information, see how Adblock Plus processes filters and hints for improving speed.