Ad Extinguisher V1.0 Blocking List Creation Guide
Overview
Ad Extinguisher removes ads by recognizing their URLs. A typical banner ad source code might look like:<a href="http://www.somecontentsite.com/M=18391.925383.6102/D=junk/S=839
2838:N/A=95609/?http://www.someadvertiser.com/asp/sellmorejunk.asp" targ
et="_top"><img width=468 height=60 src="http://banners.somecontentsite.c
om/ads/annoying/extremely/ad_stuffyoudontneed2.gif" border=0><br><center>
Click Here! Buy more stuff you don't need!</center></a>
It is made up of a link (a href), a banner (img src), a text comment appearing under the ad, and a close link (/a). Notice that the link points to the content site, not the ad site. If you were so foolish as to click on the banner, your browser would hit the appropriate URL at the content site. The content site would log the clickthrough and issue a redirect to the advertiser's site. This redirect serves two purposes: it allows the advertiser to determine where the clickthrough came from (and therefore which ads work), and it allows the content site to bill the advertiser per clickthrough.
There are several ad warehouse sites that place banners on other sites. In this case both the link and the img source would point to the ad warehouse site. In addition to sparing the content site the burden of running its own ads, this allows the ad warehouse to plant a cookie and track users across all sites that carry its ads. Of course, it also makes it much easier to block all those ads with one blocking rule.
In the example ad above, there are two URLs: one in the link href, and one in the img source. Ad Extinguisher will extract these URLs and compare them with the regular expressions in its blocking list. If either URL matches, Ad Extinguisher wipes out the whole ad, replacing it with: <img src="http://ae.loc/b" >
This is a reference to the transparent GIF whose image is hard-coded into Ad Extinguisher, or to ae.loc/i, the lightning bolt gif. The extra spaces before the > pad the new tag to the same length as the group of tags that it replaced. Ad Extinguisher considers a group of tags to be anything between <a href> and </a>, or between <form> and </form> tags. An <img src> outside a tag group is treated independently. This generally has the desired effect of completely killing an ad without breaking the page formatting.
An ad blocking list is a set of regular expressions, one per line, stored in a text file, with begin and end tags. Publishing a list is just a matter of putting it up on a web server.
Regular expressions
Regular expressions are substrings of the text you want to match, with a rich set of wildcard characters. Ad Extinguisher uses the GNU regexp library. This is similar to the ? and * wildcards for filename matching in DOS, but noticeably more powerful. Here I will only cover the basics.
Alphanumeric characters represent themselves. Search is case insensitive.
Pattern: clickme matches:
http://www.annoy.com/clickme/please
http://www.distract.com/Clickme_now
but not
http://www.pester.com/click_me_now
Most punctuation has special meaning and needs to be escaped with a backslash (\) to represent itself. For example, the period means any character. If backslashed it means a period. Pattern: a.b matches a.b and acb but not acdb or ab
Pattern: a\.b matches only a.b
Other useful patterns: .* means anything. Do not put this at the beginning and end of your expressions.
pester\.com.*advert Good, matches pester.com, any junk, and then advert.
.*pester\.com.* Bad. Just use pester\.com instead. Putting .* at the beginning or end has no effect and slows down pattern matching drastically. This was found in beta testing.
[0-9]+ matches one or more numbers. This is good for the numbers used in ad URLs, like: advert[0-9]+
$BEGIN_BLOCKING_LIST
Each line is a regular expression. The $BEGIN_BLOCKING_LIST and $END_BLOCKING_LIST tags must be entered exactly as shown, or Ad Extinguisher will return an error when it attempts to load the list. These tags are used to verify that a list was loaded completely before the in-registry copy is updated.
An $ means end-of-line in a regular expression, so it is okay to use it at the beginning of the line as an escape character. Any line beginning with $$ is a comment and is ignored.
Create a text file, enter your regular expressions, and save the file. Go into Ad Extinguisher Control Panel, enter the full path like C:\TEMP\BLOCK.TXT into the Add New field of the List Subscriptions box, and click Save Changes. The new list should show up and be loaded as of the current time. If it shows Never and an error, the file is either not readable or badly formatted.
If you change the file, click Reload Now and Save Changes. The time last loaded will update, and the changes will take effect.
To make it easier to create blocking lists, click Show All Options. There is a Test Rule box on this form. You can enter a regular expression there and click Save Changes, then immediately switch to another browser window and reload the page with the ad. If the ad goes away, the rule worked. Copy the rule (CTRL-C) and paste (CTRL-V) it into the blocking list.
If rules seem to have no effect, make sure the browser is actually reloading the page. Put a single . in the Test Rule field, and all pages will be blocked. This is a good test.
Write an HTML page for your blocking list. The page should explain what it does. In the page you have a link like:
This list blocks ads in (some class of) sites.
When the user clicks, a new window appears and displays the Control Panel. The URL specified in the sub= is preloaded into the Add New field. The user just clicks Save Changes and closes the Control Panel window.
Now that you have your list online, send me a link to it (Contact Me)
This is all the old list should contain. Do not put any rules in it. The next time your subscribers update, Ad Extinguisher will attempt to read the list at the new site. If it is successful, it will change the address of your blocking list in the registry, and use the new one from then on. If it is not successful, it will continue to retry the old site. This prevents you from losing all your subscribers by entering a bad URL. After a few weeks, you can take down the old site, because all active users should have been redirected by that time.
Blocking list files
A blocking list is a text file. You edit it with Notepad or the DOS EDIT command. It should have a .txt extension. A blocking list looks like:
valueclick\.com/
/adstream\.cgi/
ad\.doubleclick\.net
focalink\.com
adforce\.imgis\.com
ad\.preferences\.com
ads\.smartclicks\.com
www\.eads\.com
$END_BLOCKING_LIST
Publishing lists
Once your list is ready, you can publish it so others can subscribe. Just upload the text file to your webserver. You can now type in the URL instead of the filename in the Add New field to subscribe. However, you should provide an auto-subscribe link to your page. Suppose your page is http://www.mysite.com/block.txt
<a href="http://ae.loc/config?sub=http://www.mysite.com/block.txt" target="blank"> Click here</a> to subscribe, then click Save Changes when your Control Panel appears. You must have Ad Extinguisher installed to click this link.
Relocating your blocking list
Suppose you have many subscribers to your blocking list, and you want to change ISPs or otherwise move your web space. There is a procedure for moving your list without any action on the part of your subscribers.
$RELOCATE_BLOCKING_LIST
http://www.mynewsite.com/block.txt
$END_BLOCKING_LIST