August 7, 2009 9:46 AM   Subscribe

SubPubHubbub The real-time web, manifested by services like Twitter and Friendfeed are all the rage these days. What happens if everything online could be real-time? It can, thanks to Google PubSubHub and their ongoing effort to add it to their products.

A ton of digiterati are paying attention to this lately, starting with Anil Dash's great post on how it works and can change the web, Gina Trapani's look at how Google is implementing it, and even Dave Winer loves it.

Run your own Wordpress blog? You can implement your own SubPub service, at least to Friendfeed.
posted by griffey (16 comments total) 4 users marked this as a favorite

haven't read the links yet, but I just wanted to point out that I fucking love anil dash's site icon. I always think that's what it would look like if he were a character in either Final Fight, Final Fantasy, or Bionic Commando.
posted by shmegegge at 10:11 AM on August 7, 2009 [1 favorite]

The slide show makes it look like I'm just adding a middleman to the existing process.

posted by DU at 10:19 AM on August 7, 2009

The process is a bit different, if I understand things correctly. Subscribers get notified of changes by the hub, they don't have to poll your server to find out about them. It would be more efficient. Also, you could run the hub portion yourself.
posted by chunking express at 10:31 AM on August 7, 2009

It's PubSub. And it doesn't look like many of the problems have been solved since the first time around. It's really weird Bob Wyman isn't mentioned in the Prior Art section, given this new project appropriated the methodologies and even the name of the old one.
posted by jayCampbell at 10:34 AM on August 7, 2009

The work required to run a hub for anything at scale (Twitter, Flickr, Youtube, etc) will be such that very, very few people will be able to do so. (see jayCampbell, above)

I continue to think that this is essentially Google's attempt at getting all the data on the "real-time web" pointed at them by hosting pubsub for everyone.
posted by bpm140 at 3:42 PM on August 7, 2009

This is the same old bullshit as the last time Dave Winer got excited about it, and the flaws haven't changed. It requires all clients to be reachable over incoming HTTP (which excludes all NATs etc). It requires the client to keep a port open constantly. it requires the server to keep track of each and every subscriber, for all time, since AFAIK there's no clearly defined timeout procedure. It requires the server to make a shit ton of HTTP connections every time a new article us posted. It requires the client to somehow psychically know when it needs to reregister with the server. And finally, it requires the client to trust that its registration with the server is actually active and going to work when or if a new post is finally made, and AFAIK there's nothing in the protocol that the client can use to test this.

It's just massively unworkable, consumer-unfriendly and goes against many of the ways the internet is currently architected. It's a non starter.

(polling, for all it's inefficiencies is actually a very good, robust way of accessing information in near ish real time)
posted by cillit bang at 4:41 PM on August 7, 2009 [1 favorite]

(I see from the main link that the tech has been repositioned as server-to-server. That gets around some -but not all - of the problems I mentioned)
posted by cillit bang at 4:45 PM on August 7, 2009

It's PubSub. And it doesn't look like many of the problems have been solved since the first time around. It's really weird Bob Wyman isn't mentioned in the Prior Art section, given this new project appropriated the methodologies and even the name of the old one.

The name "pubsub" is pretty common shorthand for the Publish/subscribe messaging paradigm. I've heard it used at just about every company I've worked at for the last 20 years.
posted by heathkit at 5:01 PM on August 7, 2009

I normally bow from the waist in the direction of anything bradfitz touches. I don't think this is his fault, but the hype is aimed at 'you', but we're not that you. Winer says "it allows you to receive updates of RSS feeds without polling" The jungleg link has it as "basically it will allow blogs and readers to communicate real time." But the 'you' looks like the services that serve you.

So google reader and friendfeed will get updates as a POST, and your laptop will have software that polls the aggregator/service rather than the content originator. I'll be following it as it grows, but there seems to be a lot of work to be done by everyone to make this better for 'you'. I guess big traffic slingers can try to control where the heavy polling happens, and that's not bad, but it seems that the gurus were so eager to get their entries in that we're missing something.

If a service ran in the manner of Amazon's Simple Queue Service then I think we'd be seeing something. Then anyone who wanted to could keep the heavy polling off of their cheap hosts and deflect it to the Big Guys. Maybe that's the authors' architectural WIN plan.
posted by drowsy at 8:12 PM on August 7, 2009

see also.
posted by feckless at 8:42 PM on August 7, 2009

Is this all just trying to solve a problem that doesn't actually exist?

I'm trying to things I actually care to receive in real-time. I don't even really care about actual world news in real time - I can check the news website once an hour or so, during the day, and that keeps me happy. I certainly don't see why I would need Flickr photos or blog posts right now.

I remember when Netscape was trying to do the whole "Push" thing a decade ago. It failed miserably. People didn't use it. Every effort at "Push" since then has been a complete failure.
posted by Jimbob at 2:35 AM on August 8, 2009

I've realised what the true stupidity of this architecture is. The "subscribing" element is functionally equivalent to opening a connection to the server, exactly as you would use TCP. So essentially they've invented a lame ass way of simulating TCP with HTTP, except with none of the robustness or predictability or pedigree. And that's why it sucks.
posted by cillit bang at 6:38 AM on August 8, 2009

HTTP requests are sent via TCP/IP. It's an application level protocol. Sending an HTTP request and getting a response is reliable, because TCP/IP is reliable.

I'm not convinced this is the best thing ever, but most of the criticism here seems a bit weak.

It requires the server to make a shit ton of HTTP connections every time a new article us posted.

Previously the client would initiate these connections. The difference now is thats requests that would have returned no new data should be eliminated. The servers would still need to handle a shit ton of connections, the difference being that they would be started by the clients rather than by the server.
posted by chunking express at 8:16 AM on August 8, 2009

With polling, the connections are naturally randomly distributed over time and correspond exactly to the number of clients that are actually interested. With pubsub, the server has to make a million exactly simultaneously (if the teal-time aspect is to work), many if not most will be to clients that have gone away or are unreachable, or just aren't interested anymore. And it has to do this for every post and maybe every edit, which gets silly for any site with any volume of posts.
posted by cillit bang at 8:42 AM on August 8, 2009

For those complaining that this doesn't solve the problems that end users have with polling feeds to changes: you're right, it doesn't, because that's not the problem domain at all. PubSubHubbub is a protocol designed for use between the major content sources (news sites, YouTube, Flickr, Twitter, etc.) and aggregators (Google, Facebook, et. al.) to allow them to more efficiently shove massive amounts of Atom-formatted data around quickly without the overhead of polling.

Limiting access to those endpoints that can both serve as both client (to establish subscriptions) and server (to receive hook notifications) is a feature, not a bug -- it will help to keep the number of subscribers each hub has to notify down to something reasonable.

Polling is robust, adaptable, and easy to implement, but it is hellishly inefficient compared to to a publish/subscribe model when all nodes have relatively high levels of availability and trust. For the big online players, this kind of optimization could literally save them tends of millions of dollars a year simply in reduced bandwidth and power consumption. End users, on the other hand, will not care or be directly impacted at all, which is fine.
posted by rcoder at 11:58 AM on August 8, 2009

« Older Faulkner Friday: Audiotastical, Listening to him...   |   Persuasion: Why men in ads are... Newer »

This thread has been archived and is closed to new comments