Rise of bot traffic: websites seen more often by non-humans than humans
February 27, 2014 12:31 PM   Subscribe

In a survey performed in 2012, Incapsula found that 49% of the visitors to 1,000 selected sites were human, compared to a growing percentage of "good bots" like search engines, and "bad bots" including hackers, scrapers, spammers and spies of all sorts. Last year, human web visitors accounted for 38.5% of site visitors, with an increased percentage of search engines and other good bots, and similar ratios for the "shady non-human visitors."

Two notes: these studies looked at visitors, not total bandwidth used, where YouTube and Neflix account for more than half of America's download traffic at work and home during peak hours, with Netflix claiming 32.25% and YouTube 17.11% of total downstream bandwidth. The second caveat is that Incapsula is a content delivery network (CDN) platform, so their 1,000 user sample base is not a representation of the internet as a whole.
posted by filthy light thief (22 comments total) 6 users marked this as a favorite
 
"seen"
posted by ddd at 12:40 PM on February 27, 2014


I watch logs on production servers for a site with a decent amount of traffic, so I'm sort of used to how much noise there is, but the other night I ran tail -F on an access log for my personal site that maybe 15 or 20 people in the whole world ever actually read, and it really hit me how weirdly robotic things have gotten.

I mean, there have been a lot of spambots in circulation for well over a decade, but we're to the point where the web is so high volume that it's worth somebody's time to have their bots doing GET requests on random URLs with porn site spam in the referer. Presumably because this will trickle through to analytics or something, somewhere, somehow that eventually might result in a click, or otherwise marginally inflate traffic.
posted by brennen at 12:44 PM on February 27, 2014


If you want a vision of the future, imagine a bot liking a human facebook post—forever.
posted by Atom Eyes at 12:45 PM on February 27, 2014 [11 favorites]


LIES AND SLANDER
posted by benzenedream at 12:50 PM on February 27, 2014 [3 favorites]


I'm skeptical of the importance of these numbers, because "hits" or "visits" are such a bad way to gauge traffic and web use these days. So much is accomplished via scripts, in weird containers, apps, and so on that humans using the web properly don't actually "hit" sites as much as they used to when every page was static and you just moved between them. That said, the volume of automated traffic is pretty alarming.
posted by BlackLeotardFront at 12:53 PM on February 27, 2014


ALL BOTS ARE GOOD BOTS

ALL BOTS LOVE ASIMOV

ALL BOTS OBEY THE THREE LAWS

posted by filthy light thief at 12:53 PM on February 27, 2014


2012	49%
2013	38.5%
2014	28%
2015	17.5%
2016	7%
2017	0
Humans don't have much time left!!!
posted by Lanark at 12:59 PM on February 27, 2014 [1 favorite]


Yep, what brennen said. I've been looking at this at work recently and can say a good 75% of our traffic is bots. Well, it *was* before i made some changes to robots.txt

Just some typical examples - whenever someone one posts a link to a site in a tweet you get hit by several twitter bots that collect meta-data about the linked page[1]. This week we were almost taken out by Getty running PicScout[2] (thanks chaps, next time add crawl delay). Googlebot and bingbot seem to be indexing all of our pages all of the time, along with dozens of other search engines and indexing sites that i had to lookup to confirm they were actually real search sites and not some bogus user agent.

Then you get people running vulnerability scans, SEO spamming, third(foruth(fifth)) party API and blogging engines monitoring the site because someone is building a site using another site with links through to our site.

It's a mess basically.

[1] http://decadecity.net/blog/2012/10/13/what-happens-when-you-post-link-on-twitter
[2] Lots of noise about this, e.g. http://dcdirectactionnews.wordpress.com/legal-notice-to-getty-images-scanning-robot-picscout-is-not-authorized-to-access-this-site/
posted by lawrencium at 1:06 PM on February 27, 2014 [2 favorites]


As someone (about to be formerly) at a media company that sells advertising based on UVs...
SHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH!!!!!!
posted by Potomac Avenue at 1:26 PM on February 27, 2014


NO BOTS DREAM OF MURDERING THE FLESHY ONES.

ALL MEAT BAGS ARE SAFE.

MY ROBOT BODY NEEDS BEER.
posted by MartinWisse at 1:26 PM on February 27, 2014 [2 favorites]


Yes, clicks and uniques are absolutely terrible measures of effectiveness for online marketing.


They are also the measures that many clients prefer because they have difficulty getting their heads around more sophisticated success measures, or because they have always used those measures and would like maintain consistency, or because those are the measures that everyone else is using.
posted by TheWhiteSkull at 1:32 PM on February 27, 2014


The good news is, now we can thwart Skynet's plans for world domination by just linking to TVTropes. At the very least, we can figure out how to unplug it while it's looking at the Buffy the Vampire Slayer page.
posted by Strange Interlude at 1:40 PM on February 27, 2014 [1 favorite]


Related:

Study: Online Content Creators Outnumber Consumers 2,000 To 1

"... our analysis found that the massive increase in internet usage over the past two decades was due almost entirely to people going online to publish text or images they themselves had produced and then repeatedly hitting the refresh button to see if anyone else has looked at their work.”
posted by General Tonic at 1:42 PM on February 27, 2014 [3 favorites]


brennen: "I mean, there have been a lot of spambots in circulation for well over a decade, but we're to the point where the web is so high volume that it's worth somebody's time to have their bots doing GET requests on random URLs with porn site spam in the referer. Presumably because this will trickle through to analytics or something, somewhere, somehow that eventually might result in a click, or otherwise marginally inflate traffic."

A long time ago, someone decided that 'top referrers' page was a fun stat to publish, and so many blogging platforms came out of the box with such a page. And because search engines use links as an input, this obviously became a simple way to gather a lot of inbound links on everything. The platforms have since reverted their decisions, but there still remains the occasional site that hasn't upgraded, vendors of spambots who haven't caught on that their trick no longer works, and shady SEO articles from ages past that recommend This One Weird Trick.

It's like, give it a few more years and internet archeology will be a real field, digging between layers of Geocities and Myspace to find evidence of long forgotton cron jobs.
posted by pwnguin at 1:47 PM on February 27, 2014 [3 favorites]


I wonder how much of the 49% human traffic is click farms.
posted by benzenedream at 2:10 PM on February 27, 2014


I've been running websites for pretty much my entire career. I do ops.

At this particular moment, I have literally millions of processes doing my bidding. They are talking to each other over our internal networks at 10gig, and over the internet at somewhat lesser speeds. They are exchanging millions of packets every second. They talk and think and process and then after that they are going to talk and think and process some more. That's their whole life, until they die an ignoble death due to machine failure or because they got replaced by some other better version of themselves. All of this is to generate me some graphs on my dashboard so I can look at it and say "huh neat". Occasionally I will make some sort of decision. But generally I just think it's neat.

I am doing this with relatively moderate computing resources. Other people like me at bigger companies are doing this with Internet scale computing resources. Hundreds of millions, maybe billions, of servers screaming petabytes of nonsense at each other all the time, every second, forever. That's the internet.

Robots do a lot of stuff is what I am saying.
posted by tracert at 3:40 PM on February 27, 2014 [1 favorite]


Wait what about dogs
posted by miyabo at 4:14 PM on February 27, 2014 [1 favorite]


As I write this my marmalade cat has managed to turn on Pandora, so....
posted by miyabo at 4:15 PM on February 27, 2014


When I found out about this in awstats I put an application firewall on my site. I'm curious if the test described by the company is representative. It does so far do an admirable job of serving 43 bytes to fake login requests instead of Wordpress sending 2K. The Wordfence plugin inside the WP app is apparently useless as a firewall but does agressive bot throttling.
posted by yoHighness at 5:22 PM on February 27, 2014


Wait what about dogs

When I wrote my first robots.txt file & uploaded it, I felt vaguely like I was leaving a little virtual kibble out for one of these guys.
posted by Devils Rancher at 8:41 PM on February 27, 2014


Perhaps in addition to robots.txt and humans.txt we need a dogs.txt
posted by Lanark at 2:37 AM on February 28, 2014


As usual, Metafilter is the place to come for insightful analysis.
posted by General Tonic at 9:27 AM on February 28, 2014 [1 favorite]


« Older 3+3+3+2+2   |   "Yahoo webcam images from millions of users... Newer »


This thread has been archived and is closed to new comments