Join 3,376 readers in helping fund MetaFilter (Hide)


Schema
June 2, 2011 11:26 PM   Subscribe

"Schema ...provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages. "

"Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure.

A shared markup vocabulary makes easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters can use. "
posted by 00dimitri00 (20 comments total) 19 users marked this as a favorite

 
Thanks for putting this up. I only caught it at the end of my workday, but this could potentially be a huge moment.

Schema.org doesn't really provide anything revolutionary in terms of specific technology or general theory, but I really do have high hopes that the 'Big Two' search engines coming together will help spur commercial and more tech-savvy webmasters to actively participate in the semantic web.

Even though these unofficial standards are clunky and too esoteric for 90% of web users, this change is definitely a step in the right direction.
posted by graphnerd at 11:32 PM on June 2, 2011


The itemscope attribute they are using is not XHTML compatible. Attributes without values went out of the HTML standard about 10 years ago. I can't even begin to contemplate using a standard that doesn't take into account proper HTML.
posted by Xoc at 11:52 PM on June 2, 2011 [2 favorites]


they tried this back in the 90s with meta tags and so on, it quickly became totally useless due to spam. However, if they think they can build a good 'reputation' system for sites that are using schemas appropriately then they might have a shot.
posted by delmoi at 12:06 AM on June 3, 2011


If you look at the type hierarchy it's mainly about products, or things/services associated with products. It's less about making the Web more discoverable, and more about making it discoverable if you have something to sell - or maybe advertise?

That said this is interesting and I'd like to see how it pans out. Some immediate thoughts: Will this make everything marked up from now more discoverable? How will it be gamed? Etc.
posted by carter at 12:50 AM on June 3, 2011 [1 favorite]


The itemscope attribute they are using is not XHTML compatible. Attributes without values went out of the HTML standard about 10 years ago. I can't even begin to contemplate using a standard that doesn't take into account proper HTML.
The schema.org recommendations adhere to the newer HTML5 standards (more) and they can easily be backwards compatible to XML by using <div itemscope="itemscope">, which will work with both XHTML and HTML5 parsers.
posted by abstractdiode at 12:54 AM on June 3, 2011 [4 favorites]


Don't Microformats already provide this? With the advantage that they're in use on millions of pages already.

Attributes without values went out of the HTML standard about 10 years ago. I can't even begin to contemplate using a standard that doesn't take into account proper HTML.

It's proper HTML5 RDFa (more detail here). You can always do itemscope="itemscope" if you want to stick with XHTML style.
posted by jack_mo at 12:55 AM on June 3, 2011 [3 favorites]


I was expecting, y'know, some actual XSDs...
posted by Joe Chip at 1:00 AM on June 3, 2011 [2 favorites]


Microformats... interesting idea that nobody uses. If people see that Google and MS are actually behind this (is this actually implemented in any way in the search engines, or is it still just PR?) then they *might* actually start using these.
posted by thefool at 3:03 AM on June 3, 2011


(Also, an (optional) XML namespace would be nice, maybe? Any discussion on that?)
posted by thefool at 3:10 AM on June 3, 2011


The itemscope attribute they are using is not XHTML compatible.

That's because these are based on microdata, which is part of the HTML5 spec.

Don't Microformats already provide this?

This is the Microformats/Microdata debate. But here's the thing: If you're going to de-generify HTML (as Hixie has done with HTML5), then there really is no reason not to try and attack the Microformats problem from an attribute-specific angle.

Whether or not de-generifying HTML (i.e. moving away from classes) is good is open for debate. (I'm still skeptical it makes any sense.) But at least with schema.org we're seeing an evolution of microdata away from the insane initial proposal Hixie offered that used Java-style classes. These are more human-readable and human-codeable, just like microformats.

As for XML, there's RDFa. And RDFa is powerful juju, but it's hard to easily in a site the way microformats are.

Microformats... interesting idea that nobody uses.

Nobody other than Google, Bing, and Yahoo. They wouldn't be launching Schema.org if microformats weren't a success.
posted by dw at 5:49 AM on June 3, 2011 [2 favorites]


So how long before SEO spammers make this good idea useless?
posted by thsmchnekllsfascists at 7:53 AM on June 3, 2011


I'd like this idea to succeed, but as we see in the discussion here the debate is entirely about minutae of the markup. Seriously, who gives a fuck about the syntax? Anything will work, we just need to agree. I mean, at least it's not the atrocity of RDF.

The real problem is you can't rely on publishers to reliably mark up their semantics. People spam, people get it wrong, people abuse the schema. I have some mild hope that this latest effort might actually prove useful, given that the search engines are behind it. But we've got 17 years of history of failed semantic markup on the Web now. I'm not optimisitic.
posted by Nelson at 7:57 AM on June 3, 2011


So how long before SEO spammers make this good idea useless?

We have a little time. I bet only a handful of knew what a schema was before yesterday.
posted by dw at 8:11 AM on June 3, 2011 [1 favorite]


Schema is going to make the world of difference for businesses and sites around the world. At last a fully comprehensible, regulated and accepted form of micro-formats across the three major search providers.
posted by derekmallard at 8:15 AM on June 3, 2011


So how long before SEO spammers make this good idea useless?

COB on Monday?
posted by Thorzdad at 8:53 AM on June 3, 2011


So how long before SEO spammers make this good idea useless?

Where is the SEO spam for RDFa or microformats? As dw pointed out, this is fundamentally the same system and those both have widespread use, both in publishing and in search engine reading. If those haven't seen much spam (and I don't believe they have), there's no reason to expect that to change when the format changes to microdata or the schema definitions become more centralized.

I suspect we haven't seen much spam in semantic markup because it actually works against spammer's interests. People can already lie about their content in text. Doing the same in markup doesn't make the lies any more believable, just more obvious.
posted by scottreynen at 10:28 AM on June 3, 2011


The RDF people have lashed back. "there are politics being played here and we will eventually get to the bottom of this".
posted by Nelson at 7:04 PM on June 3, 2011 [2 favorites]


This is the Microformats/Microdata debate

Seems I have some reading to do. I used to pay close attention to this stuff, but it's slipped off my radar. Ta for the overview, dw.

They wouldn't be launching Schema.org if microformats weren't a success.

Hence my puzzlement - microformats are already out there in the wild (and IIRC tended to be based on what folk were already doing), whereas this seems to be a top-down 'if you don't do this, the big three search engines will ignore you' type thing.

Whatever, the next site I have to build has lots of event listings and products to sell - the client won't give a shit whether I go with microformats or microdata, so I might as well try the fancy new thing.
posted by jack_mo at 8:26 PM on June 3, 2011


The RDF people have lashed back.

One thing to keep in mind is the RDF people have felt shut out of the last ten years of the web. XML really was to be the "language of the web," but in the end it wasn't, mainly because of XHTML being more HTML than XML in people's minds. And RDF isn't exactly sexy, so it's often forgotten -- thus RSS and Atom became the format of the feed and Microformats became the accepted way to do Plain Old Semantic HTML (POSH).

RDF was a background format that was hard for the average programmer or designer to grok; OTOH, re-doing your event calendar with hCal was pretty dead simple. (And I should know, given it took me about an hour to do it on my former employer's bespoke events calendar.)

So what you're seeing is a confluence of Hixie's Great Leap Forward and the RDF community's Rodney Dangerfield attitude. And while they do have a point that RDFa is getting unjustly shoved aside in the name of HTML5, it annoys the crap out of me that their main argument always comes down to some combination of "no respect" and "ours is so obviously better."

I wish Manu and Tantek (Celik) would explain WHY RDFa is better, not just insist it is.
posted by dw at 9:36 AM on June 4, 2011 [1 favorite]


the RDF people have felt shut out of the last ten years of the web

RDF is "shut out" of the web because it is useless wankery. Semantic markup just does not work in an important sense. The problem goes back as far as Aristotle, if not Plato. Magic XML pixie dust does not solve the problem of categorizing knowledge. Metaphysics is hard.

There's a narrower argument in the link I posted, which is that this new system is more or less equivalent to the old system so why do something new? I sympathize but their argument is going to go exactly nowhere. Google + Bing are the only important catalogs of human knowledge on the Internet. What they specify is going to be the markup that matters. It's a harsh reality and I don't understand why the big boys didn't just adopt what was already in use (by their own systems!). But that's the way it is.

None of this syntactic debate matters, though, one encoding is much the same as the next. What matters is whether anyone does anything meaningful with the data. 13 years of RDF failure suggests they won't.
posted by Nelson at 11:16 AM on June 4, 2011


« Older Meaghan Smith...  |  "The kind of towing we do, is ... Newer »


This thread has been archived and is closed to new comments