Join 3,512 readers in helping fund MetaFilter (Hide)


Don't follow that link!
January 19, 2005 1:16 AM   Subscribe

Google, others announce attempt to fight comment spam. Take away the PageRank bonus and they'll stop, right? Right?
posted by ubernostrum (76 comments total)

 
More info here (Movable Type is doing it), here (Robert Scoble rounds up links), here (Dave Winer claims credit), here (Shelley Powers points out this isn't new ground) and generally everywhere else in the blogosphere for the next few days too, so don't worry about missing out on the coverage.
posted by ubernostrum at 1:19 AM on January 19, 2005


I wonder if this will put a dent in comment spam bot use. It seems like it'll take a while for anyone in the seo/spam community to notice, and in the meantime it's still a useful way to spam your URL onto zillions of unpatched sites.

And even if someone does update their software to include the nofollow attribute, spammers might keep doing it just to get humans to follow their lameass links.
posted by mathowie at 1:26 AM on January 19, 2005


How long until we see the inevitable joke where one of us puts a comment on here in parody of the spammers?

Place your bets.
posted by TwelveTwo at 1:30 AM on January 19, 2005


mathowie: I don't think it's going to make a dent in the comment spam problem at all; if anything there'll be more of it now for precisely the reasons you're given.
posted by ubernostrum at 1:38 AM on January 19, 2005


If by "put a dent in" you mean comment spam will only double instead of triple every several weeks, well yeah sure, I definately think things are slightly less bad now that we have a nofollow attribute.
posted by bobo123 at 1:56 AM on January 19, 2005


Short term is might lead to a blitz to find sites not using it. Long term, spammers are going to notice that their page rankings have gone to Hell and have to step back and think about what the point is.

It's not an all-encompassing solution, but no one reasonable has claimed that it is. It's just another tool in the box.

There's all sorts of weirdass backtalk about this development. My favorite so far is this one, where the author simultaneously decries a "Balkanized web" and comes out in support of captchas -- which Balkanize the web because they exclude, for example, screen readers for the blind.
posted by theonetruebix at 1:58 AM on January 19, 2005


Unfortunately, most of the 200 or so comment spams I get a day come in from completely different machines. I suspect that these are zombie machines, and I don't see the people who have hacked these machines carefully going round and removing the bots which churn out links to Texas Holdem sites. I don't think the problem will get any worse, but I have a suspicion that it's not going to get any better.
posted by seanyboy at 2:00 AM on January 19, 2005


While on the face of it, it seems like a wonderful idea, I'm surprised that all those brilliant minds didn't see such a gaping hole in this method.

Comment spammers don't restrict themselves to leaving their URL just in the URL field. They leave HTML [A] tags in the comment itself. How will you automatically tag those with this new attribute? You'll have to turn off all HTML in comments, which I don't think many people will want to do.

I would have liked to see a way to mark an entire block as a "don't follow these links" section. Perhaps any text between < !-- nofollow --> tags would not have the links followed?

One last thing: Won't this break the validation on pages?
posted by madman at 2:16 AM on January 19, 2005


One last thing: Won't this break the validation on pages?

Actually nofollow isn't an attribute it's just a value (as in rel="nofollow") so things should validate fine.
posted by bobo123 at 2:25 AM on January 19, 2005


Actually nofollow isn't an attribute it's just a value (as in rel="nofollow") so things should validate fine.


Yes, I shot my mouth off without looking at it more closely. That will teach me. :)

It's an attribute, not a tag.
posted by madman at 2:34 AM on January 19, 2005


theonetruebix, this is a poor solution that will lead to a Balkinized web. Some search engines, like Google, will implement this and others won't. Others will ignore it (they want to reward spammers). Some will probably come up with their own solution. These kind of hack-and-patch solutions just don't scale to the size of the web.
posted by nixerman at 2:46 AM on January 19, 2005


madman: The proposed support for it in weblogging tools applies the attribute to both the URL left with the comment, and to any 'a' tags posted as part of the comment. Basically, the idea is to never trust a user's input (which I'm surprised it's taken this long to realize). What'd be nice is for most services to offer a way to turn it off on a per-comment basis...

nixerman: Google, Yahoo and MSN search are all on board; that should pretty well cover it.
posted by ubernostrum at 3:03 AM on January 19, 2005


At least Google and other blogging service are recognizing the problem and start doing something about it - instead of letting their customers suffer.

A start ... at least ...
posted by homodigitalis at 3:04 AM on January 19, 2005


Comment spammers don't restrict themselves to leaving their URL just in the URL field. They leave HTML [A] tags in the comment itself. How will you automatically tag those with this new attribute? You'll have to turn off all HTML in comments, which I don't think many people will want to do.

Is this a joke? It's hard to imagine a programmer with the sophistication to write a decent comment-posting system who doesn't know how to change random HTML elements within the comments.
posted by grouse at 3:04 AM on January 19, 2005


While on the face of it, it seems like a wonderful idea, I'm surprised that all those brilliant minds didn't see such a gaping hole in this method.

I certainly hope you're joking or trolling, because the point is to rewrite HTML tags in the comment field.

How will you automatically tag those with this new attribute?

LOL
posted by aerify at 3:14 AM on January 19, 2005


Okay, maybe I shouldn't laugh. But madman should maybe learn a thing or two before spouting off garbage.

How would they do it? I suppose there are many ways but they easiest is just a simple match and replace with regex. madman, did you know that when you use a text editor there's a wonderful function called "replace"? Say you wrote a letter to your friend that has the word "cat" in it a hundred times. Did you know that you can change all those to "dog" with one mouse click? Yes, you can! Aren't computers amazing? Well, it's the same thing, but with some regular expressions and wildcards (I would assume).

Yes, I shot my mouth off without looking at it more closely.

You most certainly did.
posted by aerify at 3:24 AM on January 19, 2005


Only noticed the second time in, but "Dave Winer claims credit".
So, was it really Winer's idea, or is he just bullshitting here.
posted by seanyboy at 3:29 AM on January 19, 2005


If you want to get knee-deep in the stuff, I'd point to either Ian Hickson or Lachlan Hunt as the "inventor" here. Or possibly someone in Phil Ringnalda's comments.
posted by ubernostrum at 3:36 AM on January 19, 2005


madman, did you know that when you use a text editor there's a wonderful function called "replace"?

Yes, you're so great at sarcasm! I bow to you! (But were two posts really required to put me down?)

I initially thought this would only be applied to the hyperlinks to the URL of the person commenting. I see that the plugins will rewrite the HTML in the comments themselves. See my earlier comment about looking at it more closely.

(Of course, this will screw some pagerank goodness for legitimate URLs too.)
posted by madman at 3:38 AM on January 19, 2005


There seems to be lots of hand-waving about how the search engines and the bloggers are going to be cutting off their noses.But get a grip, we're talking about comments put in by blog visitors. ubernostrom nails it - "never trust user input".


The same system could stop the clog of trackbacks, but Google's already taped a patch on the 'engine to reduce the issue and I can well imagine that some bloggers won't want to reduce the level of "importance" it lends them.
posted by NinjaPirate at 3:50 AM on January 19, 2005


I've long been an advocate of stripping PageRank from links in comments, so I'm really hopeful about this new development - although I'm concerned that spammers will just up the spam and hope for idiots to click links instead (see e-mail spam). As for removing PageRank from legitimate URLs, I honestly think that's a tiny price to pay for removing the main economic incentive for comment spam.
posted by simonw at 3:53 AM on January 19, 2005


Comment spammers aren't that bright, they spam my contact form once a day, which only emails their links to cheap viagra and texas holdem to me and ends up in my 'spam' folder. But hey, anything to make it stop... anything!
posted by dabitch at 3:57 AM on January 19, 2005


ubernostrom nails it - "never trust user input".

Seconded. Basic rule for secure coding practice.
posted by nofundy at 4:38 AM on January 19, 2005


I don't have anything against the "let's all hate Dave Winer" interweb phenomenon, but reading his post, I don't see how Dave "claims credit" anymore than Ben (from Six Apart) claimed credit in what he wrote. They both merely claim credit for the implementation, and clearly point to Google developers as the driving force behind this.
posted by Plutor at 4:53 AM on January 19, 2005


I think it's when he said... "I got an email from Matt Cutts at Google asking if I wanted to help them work out a way to clue Google's crawler in on what is and isn't comment spam..." instead of I got an email from Matt Cutts at Google asking if I wanted to implement a system they had developed for stopping comment spam.
posted by seanyboy at 4:59 AM on January 19, 2005


I don't think this is going to make much of a difference. Spammers depend upon volume and they'll simply up their output searching for weblogs that haven't implemented the nofollow attribute in links or in the hopes that the increased volume will add to their chances of someone actually clicking on the link. Google doesn't (not yet, at least) add Page Rank to my email but that doesn't seem to have done anything stem the deluge.

Like email it seems to me that the only viable solution is server based (block comments from open proxies, use a regularly updated blacklist, use mod_security, SpamAssassin or similar tools, etc.). Alternatively, the powers that be could acknowledge that the space I lease or own on a web server is real property and what these spammers are doing is akin to trespassing and is criminal.
posted by cedar at 5:33 AM on January 19, 2005


I think that for it to make significant difference, we'd have to be dealing with reasonable, legitimate businessmen. I really don't see spammers saying "Oh, they're trying to discourage us. What say you, we pack up and go find a real job?"

But it does make google feel good for trying and is yet another little shoe-in, of sorts--assuring the technorati that they "get it", are "with it".
posted by ThePrawn at 5:59 AM on January 19, 2005


On my own sites, I too never trusted user input. That's why I always forced people commenting on my sites to use a dumbed down alternate to html. The problem for places like Metafilter & other large scale blogs / forums is that instead of having 5-10 different people using the site daily, there are thousands of users visiting these sites (so forcing them all to adopt a more secure format language isn't really practical).

As long as when I get comments page links from google when I search for "honda cb750 carb choke," I'll be fine. I don't care if google tosses out the rankings for the links contained within those comment pages. Hopefully this tag will put a dent in spam, but I fear that they'll just figure something else out in a month or two.
posted by password at 6:13 AM on January 19, 2005


html is fine if you parse it, filter it, and transform it to make it properly conform to a schema. i wrote a web forum that took html converted it to xml and then used xslt to do some rudimentary filtering. users were limited to a number of safe tags and couldn't make any url links that weren't considered safe. ie http, ftp
posted by drscroogemcduck at 6:39 AM on January 19, 2005


Yeah, I'm gonna have to learn voodoo too. Do.
posted by NinjaPirate at 6:45 AM on January 19, 2005


ubernostrom nails it - "never trust user input".

Which goes hand-in-hand with "Never trust user output."
posted by Ayn Marx at 6:46 AM on January 19, 2005


I'm actually happy about this news. On my site, I had not only disabled the right to use HTML but also required that people log in to post comments. This ensured that no spam would get in but had the downside effect of lowering the potential number of comments on the site. I'm now looking at the following solution:

1. People can post if they are not logged in but have no right to use HTML if they are anonymous.
2. People who have posted comments without HTML in the past and not ended on the list of spammers can post later comments in HTML without login in but get a nofollow on their links.
3. People who login in the system are given the capability to add links with do not have the rel nofollow on it.

I'm still trying to work that out (ie. what happens for people who log-in and post a spam) but I think that this effort is a step in the right direction.
posted by TNLNYC at 6:55 AM on January 19, 2005


Those of us lacking regex-fu could probably appreciate a brief tutorial on how to hack <a href="foo">bar</a> into <a href="foo" rel="nofollow">bar</a>. My MT-2.6x-based sites are in PHP and I could implement this instantly if I knew what to do. (I suppose I could limp my way through intro-to-regex material . . . )

I second dabitch's observations: online poker sites routinely spam my link-submission form on The Map Room, with the same results. (They put the URL in all the fields; now, when that happens, a Javascript alert tells them to kill themselves immediately. Futile, but implementing it felt good.)
posted by mcwetboy at 7:10 AM on January 19, 2005


I disabled comments entirely on my site because the crap flooding was overwhelming the server entirely. I never left any comments up for more than a day, and most of them were blocked before posting with blacklist.

So, if they're out there, flooding my server with spam to the point of bringing it to its knees, for no good reason since their spam won't be seen by anybody but me, and barely even that, I seriously doubt that this will have any impact at all.
posted by willnot at 7:13 AM on January 19, 2005


My MT-2.6x-based sites are in PHP and I could implement this instantly if I knew what to do.

What to do: download and install the plugin linked to here. It works with 2.661 according to the post.
posted by mw at 7:23 AM on January 19, 2005


I just turned off the html in my comments a while back and it did not stop comment span at all.
posted by nathanrudy at 7:25 AM on January 19, 2005


I don't see rel="nofollow" doing anything to deter blog spammers from continuing to hit stupidly untended and unmaintained weblogs and guestbooks like these, though. What's to be done about that?
posted by brownpau at 7:36 AM on January 19, 2005


I just turned off the html in my comments a while back and it did not stop comment spam at all.

And my referrer log is not public, but it gets spammed mercilessly. The reason is that so many people have HTML in comments turned on, and so many people have public referrer logs. Our piddly efforts don't affect the spammers' return on investment enough to matter to them.

The idea behind this initiative is that in one fell swoop a huge swath of the blogosphere will become sterile ground for spam. Whether it will be enough to make comment and referrer spamming unattractive enough for it to diminish appreciably is an open question, but this is still a much bigger deal than anything that's been tried before.
posted by mw at 7:50 AM on January 19, 2005


Thanks and done, mw.
posted by mcwetboy at 8:09 AM on January 19, 2005


I don't think anyone is claiming this is a panacea, or those who do are misinformed. The reason people spam comments is because "it works." If it stops working, fewer people will do it.

This is a good way to stop comment spam from working while keeping comments open, so you should do your part and install whatever plugin works for your tool. Other solutions like authentication and CAPTCHAS work for individual sites, this addresses the problem in the blogosphere as a whole.

I think that putting this into the default tool configurations and adding it to the hosted services will be the biggest boon, but I doubt we'll start to see less comment spam until 2006 at the earliest.
posted by revgeorge at 8:15 AM on January 19, 2005


Some search engines, like Google, will implement this and others won't.

There are no search engines of consequence besides Google. Nobody cares about their ranking in AltaVista or MSN Search.
posted by kindall at 8:25 AM on January 19, 2005


There seems to be a misunderstanding - this is not a solution that's designed for individual weblog users to implement or not. It's designed for blog software makers to put in the next version of their software, and likely enable by default. Since the majority of un-tech-saavy blog users use one of the more popular software solutions (MT, for example), when MT puts this fix in, all of them are automatically onboard and participating in the rel="nofollow" solution.

So, in the coming months, spammers would probably not see any kind of drop in their pagerank results (do they really track this?) but in a year or so, when all blogging software is participating in this, their rankings should fall dramatically, or just flatline.
posted by odinsdream at 8:35 AM on January 19, 2005


mcwetboy - I love that suggestion you put in! hehehe.
Oh and like mw, my referer log is spammed all the time, daily, since 2002. MeFi thread back then. It pisses me off to no end, nobody can see my logs but me so it is utterly useless to do as well, but i skews my stats. Wankas.
posted by dabitch at 8:36 AM on January 19, 2005


With MT, at least, I've had pretty good luck using comment-throttling and forced previews (which are not as simple as removing the "post" button)--I get about one comment-spam per month, and I'm pretty sure those are from hapless humans manually pasting in the spam.

Jacques Distler shows you how.
posted by adamrice at 9:05 AM on January 19, 2005


If a spammer doesn't know whether a blog has the nofollow enabled, aren't they going to spam the page anyway? And how would they know?
posted by smackfu at 9:05 AM on January 19, 2005


I think this is going to hurt small bloggers. I comment on blogs, on-topic, and I often include links to articles on my blog. If you're a blog owner and I comment on your blog in a way you don't like, remove the comment.

Some have said you should be able to selectively enable this on a per-comment basis, but that totally defeats the point. If you're reading the individual comments, you can just delete the spam anyway. The web is about sharing - if you don't want to share the pagerank you've built up from inbound links, why are you letting people put in outbound links at all?

What this does is wholesale-remove the link karma effect for small bloggers while not substantially cutting down on the comment spam, because the spammers still want actual people to look at and click on their links, which cost them nothing to propagate.

Why is that a good thing?
posted by Caviar at 9:08 AM on January 19, 2005


I love the person or company that registered relnofollow.com the day before the announcement.

I see how this will help make it less useful for spammers, but I doubt it will make a change in the amount of comment spam.

The spammers have their scripts. There is no sense in having it check if a site uses rel=nofollow.

As a blogger this won't help me much. I'd have to delete the spam anyway since I don't want to see "buy v!ag-rA" in the middle of my comments.

But that doesn't mean it will hurt.

Madman: You were on the right track. It's true that "It's an attribute, but not a tag." But, it still needs a little extra to follow the w3 reccomendation for link types. The well designed site should use a profile to identify nofollow.

(on preview: Cavier: exactly!)
posted by ?! at 9:22 AM on January 19, 2005


Caviar, what's your personal experience with comment spam? I disabled comments on my MT-based site after receiving - get this - around 400 spam comments every day. I don't even have a slightly popular site. The problem is not manageable once your site is on whatever spamming list there is. It doesn't go away, and even with spam-fighting tools, it's a hassle to process the comments without accidentally removing useful messages.

That was a few months ago, so things have hopefully improved in the spam-fighting-tool arena, but I can't say much about them. I've given up on comments, personally.

How will this hurt small bloggers, though? Do small bloggers live on the revenue generated by views of their page? Do they get the majority (or even a slightly significant amount) of their visitors from Google? It seems most small bloggers have a dedicated audience that may fluctuate in size as word-of-mouth or links from other blogs (not comments, but other blog bodies) get passed around. I don't have any idea what my site's PageRank value is, and I rarely get any visitors from Google. I don't sell anything, or have ads on my site, so I don't stand to gain or lose anything from the placement of my site in Google's index.

Big-time bloggers who do care about their PageRank already have a readership, and I doubt they count on the links in other blogs comment section to increase that readership. Remember, this solution does not impact legitimate links in the body of a blog, only the comments. So, if I link to big-time blog in my entry for today, big-time's PageRank will increase, just as expected; but if big-time's owner comes to my page and posts a self-link to big-time's page as a comment, their rank stays the same. Where's the disadvantage, here?
posted by odinsdream at 9:25 AM on January 19, 2005


will become sterile ground for spam.
Which will stop spammers seeding new zombies, but it'll not stop the zombies that are already out there flicking out comment spam. This problem will be with us for years.
posted by seanyboy at 9:35 AM on January 19, 2005


Since 9:00pm last night, I've logged exactly 311 attempted spam comments on my site (thank you MT-Blacklist). I shudder at the thought of not using any means at our disposal so I welcome this deterrent.
posted by KevinSkomsvold at 9:48 AM on January 19, 2005


I get some comment spam, and I moderate comments. Aging off articles (anything older than X days automatically has comments turned off) helped a LOT. Renaming the comment processing page helped somewhat on top of that.

Not everything on the web is about money. I want people to read my blog because I think I have good ideas about some things that I think will get a little better simply by having more people talk about them, increasing the likelihood that there will be a strong public opinion in one direction or another.

I'm not convinced that this new attribute is necessarily bad (I'm also not sure it helps in any meaningful way) - I just don't think they've thought through all of the ramifications.
posted by Caviar at 10:06 AM on January 19, 2005


What ramifications? I'd do everything I could, as it is, to prevent anyone from increasing their own pagerank by posting comments on my site; this is just implementing the idea on a larger scale.

If you came to my site, and commented "hey, I have a cool idea, here is a link to it on my own page," should that increase your pagerank? Of course not.

If I read the comment, decide it really is cool, and then in my next entry, I post "hey, this link is to a cool thing," well, then your pagerank should increase, since the recommendation is actually valid.

I don't know anyone (besides spammers) that rely on blog comments as a mechanism for spreading information about their site. As said above, a lot of people commenting on blogs have blogs themselves. If they read about something truly worthwhile, they'll post it on their own page, and the recommendation will be respected. What are the ramifications?
posted by odinsdream at 10:39 AM on January 19, 2005


That's certainly an incorrect reading of how I look at the comments.

On my own blog, why would I dedicate a whole other post to something that some else wrote on their blog that's related to a post I already made? Even if I really like it - I just made a post about it. The comments are an integral part of the conversation. I don't expect that other people will think differently. It's all part of the thread.

The recommendation is valid by nature of not being deleted from a page that's ultimately under my control.

I have a small blog, and while the infrequent post links from other blogs get a lot more traffic per mention than the links I've put in comments, I get more traffic overall from comment links I've put in on other blogs.
posted by Caviar at 10:53 AM on January 19, 2005


Google isn't implementing this system to stop or hinder spam, they are doing it to improve the quality of their search results. I know a lot of you have a personal interest in the amount of comment spam on your sites, but other than stealing PageRank, I can't think of any reason why Google should care.

I think odinsdream makes a good point. And Caviar, I see your point too. Maybe the solution is to set comment links as "nofollow" by default, and give the blogger the power to change them to "follow" at his discretion.

What really interests me about this whole subject is that Google is outright asking web developers to alter their code to improve their results. And, by the looks of it, we're going to do it! Now if only they'd advertise a PageRank reward for standards-compliance...
posted by hartsell at 11:14 AM on January 19, 2005


I've added a simple one-line patch to Drupal's filter module that will automatically add ref=nofollow to any URL a user enters.
posted by mike3k at 11:23 AM on January 19, 2005


The best way of controlling comment spam: write your own comment script. Even when I was getting spam I got far, far less than those who were using MT's built-in comment script. Still, I was getting a few each day -- some days, as many spams as real comments. I found ways to flag most spam so I could manually approve them before they appeared on my site, but this was also catching some legit messages, and it was wearying in any case.

So I took the step of adding a hidden field that contain an MD5 hash based on the current date, the requester's IP address, the comment thread number, and of course a site-specific "salt" so that spammers can't just hash those items and get the right digest. Basically, this cryptographically guarantees that a request for the comment page (containing the form) must precede the posting of the comment. (Actually, I didn't use a hidden field, I changed the name of one of the fields to a name derived from the hash, but same difference.)

This got rid of a lot of my spam, but it seems that at least a portion of comment spammers do their spamming by using a JavaScript bookmarklet that fills in the fields on the target comment form. The script seems a little indiscriminate as to how it fills in the fields and I was able to find a way to detect and discard these messages. (I'm not going to explain how, exactly, though you can probably figure some of it out by examining the source on my site.) Result: the only blogspam I get is the occasional message by the poor saps who are forced to enter it manually. I can handle those, I get one every couple weeks.

If they figure that out, I have other tools in my arsenal... but you can't do any of this stuff if you have to wait for Six Apart to write the code. (Blacklisting is NOT a scalable solution, sorry. It takes too much CPU, and it requires too much maintenance.)
posted by kindall at 11:37 AM on January 19, 2005


Caviar, people post comments to engage in conversation on a post topic, not to drive traffic to their own weblogs. There's people who are really interested in the discussion, and then there are those who are just trying to self-promote.
posted by brownpau at 11:41 AM on January 19, 2005


Exactly my point, brownpau. This change doesn't affect Caviar's position at all. Human readers are still going to visit his site, still going to visit the links left by his readers, and still going to engage in meaningful discussion. The only difference will be that Caviar's site might not be the first result someone gets by searching Google for whatever it is that Caviar's site happens to be discussing at that moment. If you write a weblog in order to deliver content, not boost traffic, you shouldn't be affected by this change at all. You can still go tell people about your site, but people will actually have to take an interest in personally visiting it.
posted by odinsdream at 11:53 AM on January 19, 2005


Caviar, you can keep putting your blog URL in comments, and if I was hosting a blog you commented on and liked your blog, I might add a link to it in my blog roll or as a new blog post, giving you the full Google pagerank fu.

What's wrong with that? I don't believe comments should earn any pagerank influence when the website owner didn't write them themselves.

Don't trust user input is a great way to sum this up. Just because you have the best guestbook/blog post/referrer list in the world according to google doesn't mean people making random additions to the list should also be considered as authoritative and share the merit on the originating site.
posted by mathowie at 11:56 AM on January 19, 2005


kindall: If they figure that out, I have other tools in my arsenal...

I think this is where nofollow could become really useful in fighting comment spam: you still have to do something to block the mindless robots that are already spamming you, but once you do, spammers have no reason to figure out a way to defeat it. It makes your existing tools more effective, because it removes the incentive to fight them.
posted by moss at 1:04 PM on January 19, 2005


As far as I'm concerned, deliver content = boost traffic. I do both. I write what I write because I enjoy writing, but I also want people to read it, and sometimes comment on it. If I didn't care if people read what I have to write, I'd a) save all my stuff in text files and not share them or b) limit my discussion to comments on other blogs. I participate in online discussions both because I want the conversation and also because I want people to read what I have to say. Maybe I'm in the minority, but I really I don't see the two as mutually exclusive.

Usually, the comment text is a subset of what I actually have to say on the topic, so the link has more information, and also an indirect link to other stuff I've written (yes, I have an egotistical assumption that people who are interested in any given thing I have to say might like more of it).

Actually, I think the fact that we're having this debate at all reveals a pretty glaring hole in using link popularity as a measure of importance. I still maintain that opening up your blog to public comments is an implicit endorsement of the validity of the content thereon, user-supplied or not. I'll bite that maybe they're not as valid as an explicit link, but they're valid nonetheless. Pagerank, ostensibly, doesn't say "I think this link is valid because I created it" - it says "I think this link is valid".

On the subject of "don't trust user input" - is putting a link in a comment equivalent to hacking a buffer overflow and taking over the site? I tend to think no.

This is essentially the same point I raised in the question of spammers taking over del.icio.us. I think that's an interesting parallel, since it's really just one huge blog with links in all of the comments.
posted by Caviar at 1:04 PM on January 19, 2005


(Blacklisting is NOT a scalable solution, sorry. It takes too much CPU, and it requires too much maintenance.)

Well I'm not a programmer by any stretch of the imagination so its scalable enough for me, CPU be damned.
posted by KevinSkomsvold at 2:07 PM on January 19, 2005


Some credit should go to Evan Martin, formerly of LiveJournal and now of Google.
posted by mote at 2:09 PM on January 19, 2005


showing up late here, as always, but there's a few points worth mentioning:

I don't see rel="nofollow" doing anything to deter blog spammers from continuing to hit stupidly untended and unmaintained weblogs and guestbooks like these, though. What's to be done about that?

Abandoned blogs are definitely an issue, and targetting them in general is grounds for another, equally-wide-reaching effort. But they're not as bad a problem as it might seem, since abandoned blogs tend to have very low PageRank, due to the lack of updates and new content, so they don't actually help spammers that much.


There are no search engines of consequence besides Google. Nobody cares about their ranking in AltaVista or MSN Search.

Nope, SEOs frequently list each of these search engines in their reports of success. Many of their customers are either completely uninformed about the web, or don't care about the fact the SEO they're paying is ruing the web. Either way, a search engine that's owned by Microsoft still seems valuable to them, since their default homepage in IE uses that engine.
posted by anildash at 3:58 PM on January 19, 2005


"It seems like it'll take a while for anyone in the seo/spam community to notice"
By the way, Matt, this was the funniest thing on the page. :-)
posted by fooljay at 5:39 PM on January 19, 2005


Okay, here's a serious question.

Matt - will you be enabling nofollow on links in comments on Metafilter posts?
posted by Caviar at 8:49 PM on January 19, 2005


I blame bush.
posted by Balisong at 10:17 PM on January 19, 2005


Personally, I think Metafilter, Kuro5hin, Slashdot, et al are perfect examples of where rel="nofollow" should not be implemented, because it is these types of communities where the comments are truly more interesting (by far) than the original post and hence is a truly collaborative authorship.

Well, that and the fact that Mefi requires a login to post and our author links go to the profile page... ;-)
posted by fooljay at 11:29 PM on January 19, 2005


Sorry, this is probably way too late to be of use to anyone.

I stopped blogging about 3 years ago, before comment span even spawned, but it seems pretty easy to eradicate altogether, although at the cost of accessibility.

Instead of a traditional submit button, use an image, add a link to it which uses a Javascript onclick to submit the form.
Anyone without javascript - hello Mr Zombie - won't get anywhere. Probably worth making the link go to an apology page explaining why enyone who got there... got there.

code goes:
<a href="javascriptapology.html" onclick="document.forms['*form name*'].submit(); return false;"><img src="leavecomment.gif"></a>

I may have missed something important, having been away from the scene, but that would seem to fix it.
posted by NinjaPirate at 1:39 AM on January 20, 2005


Also, Adam Mathes, he understands what this is about.

This doesn't stop comment spam, it stops search engines having to worry about it.
posted by NinjaPirate at 2:19 AM on January 20, 2005


NinjaPirate, you made me laugh today.
posted by Caviar at 7:19 AM on January 20, 2005


Caviar, metafilter is the type of community where spammers would be noticed and removed very quickly. Many thousands of readers skim these pages daily, and, most recently, paid (donated) $5 to do so. If I charged people $5 to comment on my site, that's most definitely a different animal altogether. I'd probably see a dramatic reduction in the spam.

NinjaPirate: I don't know why you think spam-generating robots don't understand javascript. They do.
posted by odinsdream at 9:01 AM on January 20, 2005


So, does anyone have any details on whether this is actually "don't follow the link" or "don't count the link in the pagerank for the destination"?

Is part of the problem that the spammers are getting traffic boosts from the googlebot?
posted by Caviar at 11:46 AM on January 20, 2005


The problem with automated solutions to stopping comment spam is that for any automated test, you can come up with a bot that can do the test. There's something to be gained for staying ahead, and there's a lot to be gained by using different tests than everyone else. But eventually, if the comment spammers want to crack them, they can and will.

Currently, there isn't a good general case test that a human can do very quickly, and which a computer can't be programmed to do quickly, or for which there aren't other technological bypasses (my favorite is the distributed free porn captcha buster).
posted by Caviar at 1:06 PM on January 20, 2005


rel="nofollow" means the bot doesn't even go there. The comment spammers can crack blacklists and captchas -- something you seem to know a lot about, eh, Caviar? ;) -- but it all comes to nought for them if their efforts are rewarded with zero pagerank advantage.
posted by brownpau at 6:44 AM on January 21, 2005


brownpau:

Yes, I know a lot about cracking. Your intimation to the contrary, my hats are all white as the virgin snow. I have a couple of degrees in CS, and distributed security as it relates to networked applications is a particular interest of mine.

I disagree with your particular statement. Pagerank isn't the only gain to be had from comment spam, and rel="nofollow" doesn't do anything to increase the cost of comment spam, which is marginal. Even with reduced pagerank benefits, the costs are still massively lower than the benefit.
posted by Caviar at 10:50 AM on January 21, 2005


« Older Today's Music Tomorrow....  |  Athens chief fumes at US lewdn... Newer »


This thread has been archived and is closed to new comments