Google: Web Authoring Statistics from 10^9 pages.
January 26, 2006 6:48 AM   Subscribe

Web Authoring Statistics from Google. An analysis of a sample of slightly over a billion documents, extracting information about popular class names, elements, attributes, and related metadata.
posted by signal (29 comments total)
 
Please send this to the Department of Homeland Security immediately.
posted by RichAromas at 6:56 AM on January 26, 2006


We need some way to verify the reliability of metadata in web pages and describe what its purpose is.
posted by GuyZero at 7:04 AM on January 26, 2006


Note: You will need a browser with SVG and CSS support to view the result graphs correctly. We recommend Firefox 1.5.

Woah, woah, does firefox 1.5 have native SVG support?
posted by delmoi at 7:16 AM on January 26, 2006


Yeah, but nothing else does, so using SVG graphics is rather pointless.
posted by smackfu at 7:19 AM on January 26, 2006


this is intresting. I had no idea they were comming out with a new version of HTML with new tags

Also:

The rest of the top 20 classes are either presentational or otherwise meaningless (msonormal, for example, which is one of the classes that Microsoft Office uses in its "HTML" output). Of the top 20, the two classes that are used the most that are currently not covered by HTML5 are copyright and search.

I like how they put "HTML" in "Scare Quotes" :p
posted by delmoi at 7:22 AM on January 26, 2006


There are only twice as many text/plain documents out there than application/msword documents (and that doesn't take into account the fact that text/plain is the default MIME type of some servers while many application/msword documents will end up labelled as something else).

Wow.
posted by delmoi at 7:26 AM on January 26, 2006


I like how one of the top 20 classes is "msnormal". Thank you, Microsoft, for your obiquitous Word->HTML conversion code (although, to be fair, they may use this by default in FrontPage).
posted by thanotopsis at 7:29 AM on January 26, 2006


I don' t understand all the gibberish, but the pictures are pretty.
posted by OmieWise at 7:34 AM on January 26, 2006


I'm not sure why they used SVG. Since it's a report from December 2005, I assume the graphics are static images of the survey data, so they could've just used GIFs. Or, if they wanted to use a relatively newfangled graphics format that some browsers don't fully support, they could've used PNGs.
posted by kirkaracha at 8:47 AM on January 26, 2006


"The most-used attribute on html elements is xmlns, from misguided people using XHTML but sending it as text/html. They even (just) outnumber the people who specify the lang attribute!" [source]

I never expected so much attitude in their write-up, some of this reads like a standards zealot going off on frontpage users.
posted by mathowie at 8:56 AM on January 26, 2006


Doesn't using SVG allow these images to be dynamically updated far easier than a dynamic data to PNG or GIF conversion on the server side?

And support for a format can only happen when it gets used. I'll be very happy when an open vector format is supported in all browsers. Now we just need an open colour space...
posted by juiceCake at 8:58 AM on January 26, 2006


Damn, I was hoping they'd have some interesting stats on CSS ID/class naming such as:

Interesting CSS ID/class names:
- asWideAsMyBigPenis
- shitHead_wrapper
- floatThisBitch
- boldenizeThisShitNigga
posted by afx114 at 9:30 AM on January 26, 2006


W3C is bunk. Only ISO is the true standard. You will be assimilated.
posted by Protocols of the Elders of Awesome at 9:47 AM on January 26, 2006


Doesn't using SVG allow these images to be dynamically updated far easier than a dynamic data to PNG or GIF conversion on the server side?

Sure, and that (or Flash) would make sense if the data was being updated. This data is from a December 2005 study, so I assume the data isn't being updated and there's no need for the images to be dynamically updated. This seems they used SVGs because Firefox 1.5 can display them, not because it's the best solution.
posted by kirkaracha at 9:50 AM on January 26, 2006


What would be interesting (and what I thought this was at first glance) is an aggregation of all the data they're getting from the Google Analytics service.
posted by signal at 10:03 AM on January 26, 2006


...This seems they used SVGs because Firefox 1.5 can display them, not because it's the best solution.
Which is sort of ironic when you consider some of the data...
posted by Thorzdad at 10:05 AM on January 26, 2006


The next most-frequently-featured element is meta, followed by br.

You know, sometimes, after a lot of meta, I like to have a br too.
posted by LondonYank at 10:28 AM on January 26, 2006


Sure, and that (or Flash) would make sense if the data was being updated. This data is from a December 2005 study, so I assume the data isn't being updated and there's no need for the images to be dynamically updated. This seems they used SVGs because Firefox 1.5 can display them, not because it's the best solution.

Just because they are only presenting Dec. 2005 data doesn't mean that's all they have. I feel certain that they've got this data for any range you please, though anything other than that is only available internally.

From that perspective, they choose between:

a. reusing the same image display method they've already got
b. creating a bunch of GIF/PNGs/whatever for this one instance
posted by camcgee at 10:37 AM on January 26, 2006


/me has used <div class="floatThisBitch"> IRL. I like to read people's source code for curious usage and try to return the favor since I can't be the only one out there.

Interesting commentary in the FPP and the parallels in usage with the emerging HTML 5 standard serve to confirm my skepticism as to its utility.
posted by Fezboy! at 10:54 AM on January 26, 2006


Damn, I was hoping they'd have some interesting stats on CSS ID/class naming such as:

Interesting CSS ID/class names:
- asWideAsMyBigPenis
- shitHead_wrapper
- floatThisBitch
- boldenizeThisShitNigga


Well, code up an electronic 'humor meter' and I'll get right on that.
posted by delmoi at 10:57 AM on January 26, 2006


some of this reads like a standards zealot going off on frontpage users

Not that there's anything wrong with that...
posted by gimonca at 2:06 PM on January 26, 2006


Sure, and that (or Flash) would make sense if the data was being updated. This data is from a December 2005 study, so I assume the data isn't being updated and there's no need for the images to be dynamically updated. This seems they used SVGs because Firefox 1.5 can display them, not because it's the best solution.
posted by kirkaracha at 12:50 PM EST on January 26 [!]


But now it could be easily updated. It's sort of like using CSS for layout. You not going to change it every day, but a year down the road, presto, you could change your whole site.

I think they used it because there is an interest in getting the format in use, and an opportunity to try it out.
posted by juiceCake at 7:56 PM on January 26, 2006


re: SVG, I think that a) they're interested in promoting FF b) most people who have any interest at all in this data already have FF installed, at the very least for testing purposes c) 800 pound gorilla.
posted by signal at 11:11 PM on January 26, 2006


It appears that Firefox 1.5 on Mac can't display these graphs, and neither can Safari. So well done, Mr Google Standards Nerd, your open and interoperable standards-based graphs have locked out an entire platform.
posted by influx at 2:17 AM on January 27, 2006


influx: Firefox 1.5 on OS X most certainly can display those graphs. Are you perhaps using some sort of unofficial build (e.g., G4- or G5-optimized) that doesn't have SVG support enabled?
posted by bpt at 7:47 AM on January 27, 2006


So well done, Mr Google Standards Nerd, your open and interoperable standards-based graphs have locked out an entire platform.

Wouldn't that be Apple locking themselves out of standards? I guess web developers should make Safari-specific pages? Wouldn't that just be wonderful and very MS like.

Thankfully Apple has started to integrate SVG support into WebCore so congratulations will be due to Apple soon for supporting web standards.

From the Firefox page on SVG:

"All platforms that Firefox ships for use the same rendering backend, cairo, so their performance characteristics will generally be similar. Performance on linux is the hardest to predict, as it will vary due to various X servers' implementation of the RENDER extension."

No mention of excluding SVG from the Mac version. And it appears to work for bpt as well.

Version Trackerlists SVG support for 1.5 too.
posted by juiceCake at 8:33 AM on January 27, 2006


I tried the Safari SVG support in Webcore. It's pretty buggy (labels aren't in the right place) and slow. OTOH, it does allow selection of the text in the diagrams which is better than Firefox.

They should have used PNGs.
posted by smackfu at 8:47 AM on January 27, 2006


They should have used PNGs.
posted by smackfu at 11:47 AM EST on January 27 [!]


Why? Maybe the fact that Safari doesn't yet have good support will spur support on. Otherwise, we'll never get these companies to support emerging and existing standards.
posted by juiceCake at 10:49 AM on January 28, 2006


If they had used PNGs, we would be talking about the actual article more. Their message is getting lost because of their display choices. Good reason not to use PNG.
posted by smackfu at 11:30 AM on January 28, 2006


« Older Pepsi Blue?   |   Delegating our moral beliefs. . . Newer »


This thread has been archived and is closed to new comments