2007-02-22 23:27Blogging statusAfter setting up the anti-spam plugins for my blog, I said that they had saved me from having to block the trackback and comment features that WordPress provides. It is only now, months later, however, that one can get a sense of the long term effectiveness of these defensive measures, and also appreciate their hidden cost. The cost I refer to is false negatives: spam ending up in the moderation queue (instead of being deleted outright), in particular trackback spam accepted as being from a legitimate blog. While the situation has certainly improved, over the course of months the extra effort of catching these false negatives could prove sufficient to make one abandon the idea of an open blog. I will thus analyse the true situation so far, and also point out a piece of good news about the popularity of my blog. Before installing the plugins, I must have been getting several spam trackbacks a day, which would have obviously been impractical to manually delete, but where do I draw the line between impractical and practical? Well, in the whole month of 2006-12 there were at least 22 (I may have deleted some at the beginning), in 2007-01 there were just 5 (I don’t remember deleting any) and in 2007-02 so far there have been 22. Also note that last week I added a string common to the domain names in many of the spams to the built in blacklist feature of WordPress. I am confident this string would never appear in a legitimate trackback so there is little cost in me doing this, but it can only be a short term solution, as registering domains can happen as fast as I can add them. A slightly better strategy would be adding the names of the products they are advertising, but obviously this would lead to them obfuscating the words using numbers in place of letters as we have seen in email spam. But, as we have seen in email spam, spammers have to resort to using images to conceal their messages, because text filters are too good, and blog comments do not allow the insertion of arbitrary images. Also trackbacks add the extra difficulty of not allowing spoofing, in the sense that anyone can send an email pretending to be billgates@microsoft.com, but a trackback has to have a real blog at the end of it for it to get past my spam filter. Or at least that’s the theory, because of course I have checked the links included in the spam trackbacks that get into my moderation queue and I cannot find any unique token which they could use to tell a server-side script to generate a page containing a link to my blog. Nor, indeed, did I find a page crammed full of links to everyone they spammed. Now, there may be some clever tricks going on, including user agent sniffing, or short-lived pages, such that a casual visitor or spam sucker checking the spammer’s site sees a very different page from the one checked by the plugin. This sort of system, though, would not be very resilient to user agent spoofing, or putting in a timed delay before checking the website for legitimacy. Slightly more complicated, if the spammer’s page checked the IP address of the browser or blog making the request, it could serve up a page based on that, having previously recorded the IP while leaving the spam. But this could be worked around completely by using a proxy service, even pairing your blog with a friend to act as proxies for each other. Of course this is all speculation, and what is really needed is for the plugin to make it clear on what basis it is accepting the spam. I propose a debug mode, and intend to contribute a patch to the plugin at some point, but that requires a bit of research into the use of Snoopy, which is what actually does the downloading of pages. One thing I did find when looking through simple-trackback-validation.php is that it doesn’t observe the standard engineering practise of “fail to safety”. That is, upon encountering an error in its own behaviour, it responds by taking the most drastic action, potentially deleting a legitimate comment and not even informing the user that this error has taken place. Presumably the assumption was that only a spam trackback would trigger errors, but I think that is unrealistic. I have therefore, for now, made the following change at line 118: ///////////////////////////////////////// // Loading snoopy and create snoopy object. In case of failure it is being considered as spam, just in case. ///////////////////////////////////////// if (!$stbvIsSpam && !stbv_loadSnoopy() ) { // Loading snoopy failed $stbvIsSpam = true; } else { // Create new Snoopy object $stbvSnoopy = new Snoopy; } to ///////////////////////////////////////// // Loading snoopy and create snoopy object. In case of failure it is being considered as spam, just in case. ///////////////////////////////////////// if (!$stbvIsSpam && !stbv_loadSnoopy() ) { // Loading snoopy failed //$stbvIsSpam = true; $badAssumption = true; } else { // Create new Snoopy object $stbvSnoopy = new Snoopy; } There are more elegant ways of making this change, which I will use before making a diff and informing upstream. As the saying goes “You can’t have pudding if you don’t eat your meat”, which not only means that I’ve saved the exciting (to me) news until the end of this post, but also reflects my vindication and the rewarding of my patience: at the time of writing, Technorati state — Drab as a fool, aloof as a bard Rank: 899,844 (6 links from 4 blogs) That’s right, I have reached my goal of being in the top 1 000 000 blogs, and by a fair margin. Now, firstly, I must admit that I noticed that I had passed the critical threshold earlier in the week, but have devoted more time to getting this blog post out today because this rank is even better than the one I saw earlier and I’m not sure that my rank will stay like this for long. That brings me on to my second, more significant admission, that I’m not quite sure who those “4 blogs” are. I can only find 3 blogs from the Technorati pages, but in my defence I have recently entered into negotiations with someone to get our blogs linked together, which could become that 4th link (or possibly even a 5th). Also, I don’t feel particularly bound to working out the “correct” Technorati rank as their algorithm is not fully explained and for a long time they did not recognise the link of the 3rd blog that genuinely points to mine. I will concede though that this is only one possible measure of blog ranking, and a proprietary one at that, but it is the best I know about and, more importantly, it was the challenge I set myself and I succeeded without creating fake blogs or searching for ways to beat the system. Have I now lost my driving force and the purpose for writing on this blog? No, of course not. Does living the dream mean you can only have nightmares when you sleep? It depends if you are happy to have dreams you will never achieve. Should I measure the success of this blog by the amount of software I publish and contribute to the community through it? Trackbacks
Trackback specific URI for this entry
No Trackbacks
|
QuicksearchCategoriesSyndicate This BlogBlog Administration |