Link Spammers
January 31, 2005 1:42 PM   Subscribe

Interview with a Link Spammer. [via] Get to know one of the scummy linkpimping bottomfeeders who abuse our referrer logs and weblog comments, then take measures to protect yourself. AnnElisabeth.com has much more (just keep scrolling), and of course, check your own weblog software for rel="nofollow" updates.
posted by brownpau (48 comments total)
 
Like this guy?
posted by Pretty_Generic at 1:44 PM on January 31, 2005


Seven-figure salary for GoogleBombing?
posted by Steve_at_Linnwood at 1:53 PM on January 31, 2005


Part of me wants to wager with law enforcement: how many LEOs will it take to pry my thumbs off his larynx before I crush it? I say four.

But another part of me has to shrug. While exploiting proxy server weaknesses is nasty, it is incumbent on sysadmins to prevent this sort of thing, isn't it? I mean, I'm not excusing the behavior at all, but it seems that it would be relatively straightforward (if not easy) to put a stop to this.

Or maybe I'm wrong.
posted by TeamBilly at 1:57 PM on January 31, 2005


THESE IS A GRATE LINKS!!1!!1!

Shrink yur p3n1s! With a low rate mortgage! while you work more productively!
posted by orthogonality at 2:01 PM on January 31, 2005


Just because I find it interesting to know who is and is not using this, I have my browser's custom CSS set to make nofollow links bright green. This allows me to see at a glance, for example, that the "scummy" link above is protected. Or that Wikipedia's external links are nofollow'ed. Etc.

To do this:


a[rel~="nofollow"] {
text-decoration:blink !important;
color:lime ! important;
}
posted by John Kenneth Fisher at 2:06 PM on January 31, 2005


"So what does put a link spammer off? It's those trusty friends, captchas - test humans are meant to be able to do but computers can't, like reading distorted images of letters."

Seems like this would be the next must-have feature for open comments software.
posted by walrus at 2:10 PM on January 31, 2005


Yeah, there's a plugin for moveable type that does this called Scode. It's quite easy to install, but the algorithm it uses by default to produce background noise is pretty weak (just plain crosshatch, no randomness). I used it on my blog, and presto, from 50 comment spams a day to none. I worry about when they start using people trying to find porn to solve captchas. gonna happen any day now. Maybe it you didn't allow hotlinking to the image....
posted by Freen at 2:17 PM on January 31, 2005


Oh, another problem with captchas is that they can limit accessiblity, specifically to those net users who happen to be blind.
posted by Freen at 2:18 PM on January 31, 2005


Yeah, that's pretty much the only reason I don't use them on my (incredibly unpopular) weblog. I may not have any blind readers, but then again, I MIGHT, and they should be allowed to comment just like anyone else.
posted by John Kenneth Fisher at 2:19 PM on January 31, 2005


Maybe it you didn't allow hotlinking to the image....

The image can be downloaded by the spammer and then they can serve it to others for solutions.
posted by jperkins at 2:21 PM on January 31, 2005


"So what does put a link spammer off? It's those trusty friends, captchas - test humans are meant to be able to do but computers can't, like reading distorted images of letters."

I recently saw a lower-tech (and probably more accessible) alternative that would probably work for most cases - ask the user a question via text. The example I saw was a math question. With a wide enough variety of questions, it just might work.
posted by me & my monkey at 2:29 PM on January 31, 2005


The problem with showing captchas to people intrested in seeing porn is that 1) there's plenty of free porn out there, and 2) once you have the traffic you can just link to adult friend finder(nsfw) and make decent money off the traffic.

Now I suppose that there are plenty of wankers who don't know about 1 (the good porn on the net) and plenty of bottom feeders who don't know about AFF.

You know really, the worst thing about these idiots is that they really make it difficult to actualy find the good porn out there, if you don't already know where to start.
posted by delmoi at 2:29 PM on January 31, 2005


Livejournal provides two captchas: one visual and one automated audio version. I don't know what the technology they use is, but if I were going to use captchas, I would only do it using that. Especially because captchas have gotten to the point that I can't read them half the time.
posted by JZig at 2:29 PM on January 31, 2005


I recently saw a lower-tech (and probably more accessible) alternative that would probably work for most cases - ask the user a question via text. The example I saw was a math question. With a wide enough variety of questions, it just might work.

That might be better for the blind, but it also might be worse if you don't speak english.

On the other hand it might be better to use these little tests rather then some huge orwellian central authentication system.
posted by delmoi at 2:31 PM on January 31, 2005


My blog is very specific and community oriented. We use it to plan our burningman camp, and so I know just about everyone who posts and comments on it. I tried to get freephilter to work, but alack no love. Maybe I'll keep trying in case we get a blind camp member or something....

On preview: If i start getting comment spam again, i'll switch so something more like metafilter, or do something like randomly change where the image lives, or have a real-time captcha. Post a link and the captcha pops up with a limited time window for completion.
posted by Freen at 2:34 PM on January 31, 2005


Sam explain the important thing to remember is it's nothing personal.
posted by mrgrimm at 2:34 PM on January 31, 2005


It's a difficult problem technically to provide a solution which doesn't limit access. Perhaps there isn't one in the longer term, but it would be a pity to think that. On the subject of audio "captchas", speech recognition is way ahead of image recognition, so the current rarity of this solution is probably its only advantage. There are also ready algorithmical solutions for maths tests ... the question then becomes one of recognising the question.
posted by walrus at 2:36 PM on January 31, 2005


"have a real-time captcha. Post a link and the captcha pops up with a limited time window for completion."

Problem being, if you have an algorithmical solution then computers are faster than people ...
posted by walrus at 2:37 PM on January 31, 2005


Captchas aren't algorithmic. They require human interation usually. So a potential spammer would have to take my image, serve it so a meatspace person elsewhere and have them solve it and bring the solution back in time. they probably could, but it might be enough to foil them for another round.
posted by Freen at 2:40 PM on January 31, 2005


"Captchas aren't algorithmic."

Sorry, I'm thinking ahead too much. I can think of a few heuristics, specifically in the AI sector, which could do a reasonable job if they can get a significant sample to work with.
posted by walrus at 2:44 PM on January 31, 2005


That might be better for the blind, but it also might be worse if you don't speak english.

Well, that's pretty simple - use whatever language is used by the rest of the page/site!
posted by me & my monkey at 2:44 PM on January 31, 2005


(I'm presuming in my last comment that you have a limited set of images to serve)
posted by walrus at 2:46 PM on January 31, 2005


Also, does the name Captcha bother anyone else as much as it bothers me?
posted by JZig at 2:47 PM on January 31, 2005


WP-Gatekeeper does that math puzzle-type thing. The questions can also be like, "what is the weblog author's first name?" etc.

I don't think the language barrier is a real problem--why would you comment if you don't understand the language the weblog is written in? It would just be gibberish... if I 'kinda' understand a post, I can 'kinda' understand a simple challenge question too.

On the other hand, mental disabilities/other lack of certain skills *are* an issue.
posted by Firas at 2:48 PM on January 31, 2005 [1 favorite]


walrus, it should be fairly easy to take any open source captcha generator program and wire it up to be part of a genetic algorithm until it manages to produce the exact same image that the same open source captcha software used on the actual website, and use that knowledge to answer it. It would be harder without source, but no harder reverse engineering than cracking your average game program.
posted by JZig at 2:49 PM on January 31, 2005


I despise comment spam more than I despise spam. Before I hacked a spam comment blocker into wordpress I was getting literally 100's of comment spams advertising the usual. They got put into the moderation queue, but I'd still be deleting comments for 15 minutes every day. Now, thank god, I'm only getting 10 or 15 spammed links every week.
posted by seanyboy at 2:50 PM on January 31, 2005


Yes but then it becomes computationally expensive for comment spammers. Perhaps not enough to make it unprofitable, but at least more expensive.
posted by Freen at 2:50 PM on January 31, 2005


No, images are created on the spot by a random number generator, and with random noise in the background. Enough noise so that a person with fairly good eyesight can read it, but a computer doing OCR might have some torubles.
posted by Freen at 2:52 PM on January 31, 2005


err. troubles.
posted by Freen at 2:53 PM on January 31, 2005


But computation isn't very expensive any more. If there are a limited number of images, over the entire set of "captchas" (I hate that term too), then it becomes even cheaper once you've nailed them. Genetic algorithms are probably more expensive than heuristics (which sacrifice a certain amount of accuracy for speed).

Addressing the point of random noise, there are also workable heuristics to extract it ...
posted by walrus at 2:57 PM on January 31, 2005


If you want to check it out in action, walrus, the blog is linked in my userpage.
posted by Freen at 2:57 PM on January 31, 2005


I worry about when they start using people trying to find porn to solve captchas. gonna happen any day now.
Supposedly it's been happening for a year or so already.
posted by hattifattener at 2:57 PM on January 31, 2005


yeah. I need something better to generate the backgrounds and maybe add more characters.... but basically, i'm just not the low hanging fruit anymore. This isn't a permanent solution. This is more security through diversity.
posted by Freen at 3:01 PM on January 31, 2005


I'm using browser-side hashes to defeat spam comments. This requires your browser to perform a small Javascript computation in order to post a comment. It's invisible to users of IE, Mozilla, or Opera, but blocks spam bots because they (currently) don't support Javascript. Wordpress users should install Hashcash. I'm sure there's an MT equivalent.

I'm also using the wp-nofollow plugin. This feature will be built into the next version of WordPress.
posted by exhilaration at 3:03 PM on January 31, 2005


"This is more security through diversity."

That's why I spoke about it being a short-term solution. No doubt in the longer term there will be other measures for keeping ahead of the curve. It's the old evolution story, and there will always be winners and losers. Think I might try using what you're using myself for now though.
posted by walrus at 3:04 PM on January 31, 2005


Right on. Well best of luck. Email me if you have any questions, and i'll try to help you out.
posted by Freen at 3:10 PM on January 31, 2005


Security through diversity can work well here, especially if you do a custom solution. If someone WERE to attack the creation software instead of trying OCR, writing a system from scratch would totally protect you, at least until it became popular. Anything unusual is better than nothing.
posted by JZig at 3:15 PM on January 31, 2005


>> does the name Captcha bother anyone else as much as it bothers me?

I've always called them turing tests...


posted by login at 3:42 PM on January 31, 2005


Nah, the only way to beat brute-force computer counter-security is by using things that only a human being would know. I'm gonna develop a captcha type system that asks users questions like "what does it feel like to fall in love?" or "describe friendship." No computer could ever answer that.
posted by Hildago at 3:56 PM on January 31, 2005


Yeah, you could say: "Describe in single words, only the good things that come in to your mind about your mother."
posted by neustile at 4:11 PM on January 31, 2005


Nice bladerunner reference.
posted by walrus at 4:19 PM on January 31, 2005


I don't think calling them Turing tests is quite accurate - my understanding is that the Turing Test is about fooling a human into thinking it's talking to a human. Fooling a computer is different. Yes, I'm being a bit pedantic, but if the two were equivalent we wouldn't have this problem. Or maybe it'd be worse, whatever...
posted by freebird at 5:43 PM on January 31, 2005


>> does the name Captcha bother anyone else as much as it bothers me?

I've always called them turing tests...


The word is an acronym for completely automated public Turing test to tell computers and humans apart. (Yes, technically it is a reverse Turing test; see the article.) It is, in fact, a trademark.
posted by dhartung at 6:19 PM on January 31, 2005


"Let me tell you about my mother."

Interesting article, though I'm disappointing there wasn't more relating to the spammer's justification, although I guess if the Reg pressed the issue, the interviewee could have just said "well if you want to be all high and mighty about it, goodbye." Then again, maybe there really isn't much more to it.
posted by DyRE at 9:09 PM on January 31, 2005


Ack!
"disappointed" not "disappointing"
posted by DyRE at 9:09 PM on January 31, 2005


No, images are created on the spot by a random number generator, and with random noise in the background. Enough noise so that a person with fairly good eyesight can read it, but a computer doing OCR might have some torubles.

*might* is the operative word there.

And by the way a feed-forward multilayer neural network would probably do a much better job then a genetic algorithm.
posted by delmoi at 9:19 PM on January 31, 2005


IIRC, (well-written) captchas are designed to insert exactly the kind of noise that experience has shown real-world OCR software to have trouble with.
posted by hattifattener at 10:53 PM on January 31, 2005


Re accessibility problems with captchas, my bank is now offering an alternative "click here to have this read out to you". It's not going to work well with Lynx but least it's an option for the blind.

TeamBilly, open proxies may be a sysadmin's problem, but they're everyone else's as well. Also, many proxies are actually trojanned windows boxes - no sysadmin in charge there.
posted by i_am_joe's_spleen at 11:49 PM on January 31, 2005


« Older Build a fort! Build A THOUSAND FORTS!   |   Pokemon causes cancer Newer »


This thread has been archived and is closed to new comments