blog archive contact about feed

Trying To Limit Trackback Spam

Due to the amount of spam trackbacks, I've made a little change to the Perl script that handles the pings.

Now when a trackback is received, I visit the URL supplied in the trackback, and look to see if my website is mentioned. If it is, then the trackback is accepted, if not, an error message is returned, but the ping is still logged in the database for future reference.

I was originally planning on using Jay Allen's MT Blacklist Moveable Type plugin and ripping out the monster regex that looks for bad sites, but decided it was less effort to just scan the referring page directly.

This method isn't fool proof, but I'm hoping the number of casinos and nubile young ladies wishing to trackback to Symbian barcode applications may fall now.

If you are having trouble trackbacking to me because of this spam, contact me and let me know, and I'll see if the script needs changing. If you are one of the nubile young ladies or casino's, still get in contact, but with details of a free account and plenty of credit included in your email.

It really seems that trackback spam is on the rise, and it's a real shame. At least I'm not like Loic who was getting one trackback spam every ten minutes.

For those interested, the snippet of code doing the dirty work looks like this

use LWP::Simple; ## some other code... my $trackbacked_page = get $trackback_url; if ($trackbacked_page =~ /robertprice\.co\.uk/) { ## we're mentioned! } else { ## we're not }

As you can see, it only looks to see if this site is mentioned anywhere on the page. A better way would be to use something like HTML::Parser and extract the links on the page making sure at least one includes the domain robertprice.co.uk. But hey, the world isn't perfect, and this is just a quick hack. :-)

Entered: 2005-08-12 11:01:01
Modified: 2007-06-08 16:35:23
TRACKBACK - http://www.robertprice.co.uk/cgi-bin/robblog/trackback.pl?id=585

Rob's Other Blog Entries

See other blog entries for August 2005, or an index of all blog entries.