JSON Feeds
May 17, 2017 5:36 PM   Subscribe

Manton Reece and Brent Simmons noticed developers avoiding XML in favor of JSON for APIs. JSON is simpler to read and write, and it’s less prone to bugs. So [they] developed JSON Feed, a format similar to RSS and Atom but in JSON. It reflects the lessons learned from our years of work reading and publishing feeds. via daringfireball

See the spec. It’s at version 1, which may be the only version ever needed. If future versions are needed, version 1 feeds will still be valid feeds.
posted by cgc373 (45 comments total) 32 users marked this as a favorite
 
I'm sort of surprised this wasn't already a thing.
posted by empath at 5:52 PM on May 17, 2017 [8 favorites]


I read this and I was a little concerned about how underspec'd this was, especially regarding what is and isn't allowed in the content_html field. But then I realized that idiot developers were never actually going to read the spec and just put whatever the fuck they want in there anyway, so what the hell.
posted by 1970s Antihero at 6:05 PM on May 17, 2017 [15 favorites]


Ok, I've been out of the XML/JS loop for a decade now, but... the big problem that developers are facing with feeds is it's too hard to read XML? Hasn't XML-JSON conversion been a one liner for ages? It's 2017 and that's the hard part? What am I missing here?
posted by phooky at 6:25 PM on May 17, 2017 [9 favorites]


I have the json formater (JSONview) loaded so just clicking on the example like was almost as readable as some rss readers. if a few of the blogging platforms add it as a default expect a zillion browser based feed readers, it would be almost trivial to build in javascript.
posted by sammyo at 6:25 PM on May 17, 2017


By tomorrow, technical recruiters will be seeking people with 7-10 years experience in this.
posted by mystyk at 6:27 PM on May 17, 2017 [56 favorites]


Pssht, I was writing my own API specs to avoid crappy XML formats over 15 years ago! Some are even in use still.

I exercise my 5th amendment rights and decline to name any of them.
posted by zeypher at 6:38 PM on May 17, 2017 [1 favorite]


Hasn't XML-JSON conversion been a one liner for ages? It's 2017 and that's the hard part? What am I missing here?

I'm totally with you here.

I could completely buy a claim that the point of the exercise was to structure the data in a way that benefited from all the hard lessons learned by previous standards; there's something to be said for making a clean break and a fresh start, after all. However, centering the rationale on JSON as a format (vs. this particular model) is fairly headscratching, because as you point out XML and JSON are trivially interchangeable.
posted by tocts at 6:39 PM on May 17, 2017 [2 favorites]


They're trivially convertible if you don't especially care what the resulting data structure looks like. But the auto converters will produce json that looks like

{ "employees": [
{ "person": {
"name": {"@value": "Alice"}
} },
{ "person": {
"name": {"@value": "Alice"}
} }
] }


which is quite a bit less ergonomic than how you would natively structure json to represent the same information. Plus, to even use a converter you need to have an XML library available in the first place, and XML libraries tend to be gigantic in comparison to json libraries. Every arduino-clone will probably have a json library, but an XML parser is far less certain.
posted by Pyry at 6:52 PM on May 17, 2017 [10 favorites]


Yeah seriously, unless I'm really missing something here, XML-JSON conversion is only "trivial" if you want trivial results. XML is a document markup format press-ganged into data structure serialization duty, and it really really really shows.

(Now if I got a vote, they'd skip JSON and go straight to YAML. )
posted by traveler_ at 6:58 PM on May 17, 2017 [12 favorites]


structure the data in a way that benefited from all the hard lessons learned by previous standards;

..so they went with one that isn't self-validating (like avro), doesn't allow comments, and has poor support for circular references. Interesting. I suppose re-implementing poor choices is a way to go.
posted by combinatorial explosion at 7:03 PM on May 17, 2017 [7 favorites]


which is quite a bit less ergonomic than how you would natively structure json to represent the same information. Plus, to even use a converter you need to have an XML library available in the first place, and XML libraries tend to be gigantic in comparison to json libraries. This is why we use data-binding libraries like Jackson in the Java world. It makes the conversion trivial and clean (still need an XML parser, but there's not much to be done about that). And 'huge' is a matter of debate - pretty sure that the C++, C, Java, and C# libraries for doing XML parsing are relatively svelte compared to some of the transitive dependency glop that comes out of node nowadays - even those aren't really that large when you pare away what you actually need.
posted by combinatorial explosion at 7:07 PM on May 17, 2017


The part which may not obvious is how bad everything in the XML is from a usability standpoint: missing or unhelpful docs, unforgiving UI, major standard releases unimplemented many years after release (XPath 2!), etc.

A coworker ran into this on Monday. I'd sent him a simple command which used xmlstarlet to extract URLs from an RSS feed. Simple XPath (“//link”), worked great, so he tried it on an Atom feed and spent 15 minutes trying to figure out why it resolutely refused to match the tag structure he saw in the source. The answer was a common bit of pointless user hostility pervasive throughout the XML world: selectors are simple if you don't use namespaces. Once you get a document whose author chose use that core feature in the proper and recommended manner, however, you are forced to prefix every tag with the namespace declaration (e.g. “//{ http://www.w3.org/2005/Atom}content”). Only one namespace? Still required. Document declares a short name? Doesn't matter, you have to translate atom:tag to the full URL version every time. Hoping to handle documents from many sources? Better build your own normalization layer first or you'll get bug reports for missing data which the users can see is clearly present…

Obviously there are ways to make this less painful but it means everything you do goes from short, easy to understand code to blocks of boilerplate. Paying that frictional cost everywhere gets old when the alternative is using JSON and spending your time on the actual work you were trying to do.
posted by adamsc at 7:11 PM on May 17, 2017 [10 favorites]


i just wish we could all abandon the json acronym for 'jsdn'. all y'all (ok, me too) are just moving shit-tons of state around. not a fucking thing 'objecty' about it, unless you count dot notation after the eval.
posted by j_curiouser at 7:15 PM on May 17, 2017


Whats XML? Like, a new Vince McMahon sport, the Extreme Metafilter League?
my best friend growing up was named Jason. does this have anything to do with him? does he play XML now?
and whats a feed? is it what i do to the birds?
is javascript like the things i write on diner table mats by dipping my finger in my coffee?
someone help me please i lost the fucking manual
THANK YOU LATEX AND GERMELMERMS I'LL BE HERE ALL WEEK
posted by not_on_display at 7:19 PM on May 17, 2017 [5 favorites]


Loose typing always wins in the end.
posted by grumpybear69 at 7:23 PM on May 17, 2017 [5 favorites]


In the .NET world xml is easy-peasy. With System.Linq.Xml and Newtonsoft.Json the whole idea of "Format Blah, but this time with JSON" sounds like a lot of fuss over nothing. BUT

The answer was a common bit of pointless user hostility pervasive throughout the XML world: selectors are simple if you don't use namespaces.

fuck xml namespaces in the ear. I've never had a problem they've solved and whenever anyone uses them it's just another layer of inconvenience.
posted by a snickering nuthatch at 7:23 PM on May 17, 2017 [3 favorites]


Oh, and why I'm grumpy: none of the user-hostility in the XML world is technically necessary. It's just not something anyone cared enough to do anything about, like all of the broken examples or out of date docs floating around w3.org. The problem seems to be the by-and-for-specialists trap where people don't spend time on things they personally don't need.

I'm reminded of wondering whether RDF/A was worth implementing and concluding no after realizing that no version of the spec had matching, valid examples and the official validator had been broken for months and nobody'd noticed until I asked about it because the few people working on it all used their own systems.
posted by adamsc at 7:24 PM on May 17, 2017 [4 favorites]


I completely agree that the XML and its associated constellation of inanity is fifty flavors of bullshit. But.

RSS has been around for twenty years. There is just a staggering installed base of software in every language under the sun that interacts with it. What this could have been is a very nice library that read ATOM/RSS feeds and produced their JSON objects, and vice versa. Instead it's a different format that doesn't quite interoperate with anything, and I don't really get why.

This feels like someone got frustrated while writing their own RSS parser, gave up, and wrote a protocol instead. It's not a bad idea to replace RSS; it just seems like a ridiculously huge task if you're not getting anything new out of it.
posted by phooky at 7:39 PM on May 17, 2017 [9 favorites]


I can understand wanting to json everything but for an rss type of functionality wouldn't using a data format that understands date be a better option?

I guess you could use something like BSON instead but I guess not everyone would want that.
posted by vuron at 7:51 PM on May 17, 2017 [1 favorite]


Obligatory XKCD.
posted by blue_beetle at 8:39 PM on May 17, 2017 [9 favorites]


understands date
it's strings all the way down.

(except not)
posted by j_curiouser at 8:42 PM on May 17, 2017


Does Google Reader support this yet?
posted by slater at 8:46 PM on May 17, 2017 [11 favorites]


This feels like someone got frustrated while writing their own RSS parser, gave up, and wrote a protocol instead.

And yet, I think Brent Simmons is familiar with writing his own RSS parser, and even offers an Objective C library for doing it.
posted by wotsac at 9:10 PM on May 17, 2017 [2 favorites]


Please. JSON is so over; everyone's writing hand-crafted YAML now.
posted by cosmic.osmo at 9:22 PM on May 17, 2017


If they used protobuffers instead of JSON, it would increase the chance of google reader being resurrected.

Just sayin'.
posted by kaibutsu at 10:35 PM on May 17, 2017 [2 favorites]


I was advocating for this years ago, I called it Absurdly Simple Syndication to differentiate it from RSS (Really Simple Syndication).
posted by Hot Pastrami! at 10:42 PM on May 17, 2017 [3 favorites]


I used to have to covert documents from Excel into XML (and back) for work. Debugging why XML isn't right, given just a vague hint from the processor is the kind of thing that makes people think that there might just be a better way, like a beleaguered person in an infomercial. "Somebody help me!"

I suspect going to JSON is partially familiarity, when you're reading API output all day with Postman or other tool, it becomes seems way more intuitive to read that than trying to figure out how to query a massive XPath for a tiny bit of data.
posted by fifteen schnitzengruben is my limit at 10:53 PM on May 17, 2017


Working with small, simple chunks of data is simpler and faster in JSON, especially if you control both sides of the output/input.
Handling large sets of complicated data is awful in JSON.

As for "less bugs"? Fewer known and fixed bugs might be more accurate.

XML was used for many unsuitable tasks and became part of the awful "enterprise" software culture. Namespaces are a pain to use. But it's not a bad format.

Kids these days. Get off my parser!
posted by BinaryApe at 11:11 PM on May 17, 2017 [2 favorites]


As someone who has to work with JSON and RSS nearly every day at work, I really don't mind which format it is (as mentioned above, converting between formats is trivial). However, that's only because if somebody sends me malformed XML RSS, I tell them to sod off. Come to think of it, I guess that nobody ever sends me bad JSON. Hmm.
posted by destructive cactus at 11:37 PM on May 17, 2017


Come to think of it, I guess that nobody ever sends me bad JSON. Hmm.

Do you look at the incoming JSON, or does "bad JSON" just mean "the parser crashes in a way that I notice"?
posted by effbot at 11:48 PM on May 17, 2017 [1 favorite]


RSS has been around for twenty years. There is just a staggering installed base of software in every language under the sun that interacts with it

Well first of all they do say "Any publisher already publishing RSS and/or Atom should continue to do so. In fact, if you’re trying to decide which format (of RSS, Atom, and JSON Feed) to use, and you can do only one, pick RSS — it’s time-tested, popular, and good." So they basically agree with you.

What this could have been is a very nice library that read ATOM/RSS feeds and produced their JSON objects, and vice versa. Instead it's a different format that doesn't quite interoperate with anything, and I don't really get why.

But in addition to switching serialization formats, this also requires every entry to have a stable ID, which RSS does not. They also provide a bunch of extra stuff that RSS readers typically end up having to scrape for. So it's not possible to, in all cases, convert valid RSS into a valid JSON Feed.

If you want to write that converter, go ahead, and do your best, but I don't see what the benefit is. No feed aggregator is going to drop RSS support.

I imagine the hope is in a bunch of years most everyone will be publishing JSON Feeds, and general purpose reader software (who will never ever drop RSS support) will be able to at least relegate the scraping and missing ids and stuff to a less common corner case.
posted by aubilenon at 12:51 AM on May 18, 2017 [1 favorite]


I was involved with Atom at the first specification stages. I still publish a hand-rolled feed on my Blosxom blog, I still use feed readers, I still dig out the RSS feed hidden in various sites so I have some sensible way to pull syndicated content out in time-sorted order. I love feed formats.

If we were designing a feed format today it would absolutely be in JSON. XML is stupid, and awkward for embedding HTML. JSON is cleaner. OTOH it's hard to see any value in changing formats right now. There is no large market for a standard feed syndication system. There should be, but that market has mostly failed. Instead we have proprietary things like Twitter and Facebook.
posted by Nelson at 1:41 AM on May 18, 2017 [3 favorites]


Yes. Let's fragment a dying market.
posted by schmod at 4:30 AM on May 18, 2017 [2 favorites]


Ultimately, this is not going to solve the problems that made RSS go away in the first place. Because there was never a technical challenge with RSS, only a "wait, they're not seeing our ads when they get our articles through their feed reader?!" challenge.

Plus, to even use a converter you need to have an XML library available in the first place, and XML libraries tend to be gigantic in comparison to json libraries.

*shudders at the memory of trying to get fucking Nokogiri installed across multiple environments*

(Now if I got a vote, they'd skip JSON and go straight to YAML. )

I could get behind that, in theory.

It's true that YAML doesn't do schemas, but I've never been all that bothered about those. But the ability to use comments and namespaces gives it a leg up over JSON, in my book.

However, I think what really holds YAML back for API use is whitespace sensitivity. It makes YAML more human-readable and human-writable, which is a huge benefit for config files and the like. But it's less than ideal in a data-serialisation format.
posted by tobascodagama at 6:00 AM on May 18, 2017


Also on Hacker News, and /r/RSS.

empath wrote, "I'm sort of surprised this wasn't already a thing."

Dave Winer took a leap at it a few years ago, I don't know what became of that effort though.

If people were going to start making JSON feeds anyway, of course it's helpful to have it be standardized. But this is not just a straight translation of RSS or Atom into JSON, it is different from both, so if this becomes popular we'll have three common ways of presenting feeds instead of only two. Brent Simmons is a smart guy with a long background with RSS, I hope there's an interview or something soon where others who know the background can ask the kinds of questions being raised here.

sammyo: "if a few of the blogging platforms add it as a default expect a zillion browser based feed readers, it would be almost trivial to build in javascript."

People might even make a bunch of new services/code so that anyone can easily stick a box of feeds on their site. It'll be like 2001 all over again (see change log).

Pyry: "Every arduino-clone will probably have a json library, but an XML parser is far less certain."

Really? Web pages are in XML, seems like something there would be libraries for everywhere.
posted by johnabbe at 6:08 AM on May 18, 2017


Complex XSLT is the devils toolbox. That's all I have to add.
posted by blue_beetle at 6:52 AM on May 18, 2017


I'm allergic to formats that use indentation to denote structure, if only because I've seen too many files with a mish-mosh of tabs and spaces that would likely break something like YAML.

What I like about XML is that everything about an XML document is deliberate. You can't really just throw a bunch of data into a box and get useful XML out - the shape of it has to be pre-defined. That can be, of course, cumbersome, which is why REST and JSON are so popular. But if you want a true, enforceable standard for the shape of data that is going to be widely shared and processed by thousands of implementations, being able to validate that the shape is correct is pretty key. I see that there is an emerging JSON schema standard but since the schema itself is written in JSON, it seems like there is an intractable recursive validation problem there. Of course, looking at XML schema, the same problem exists. So.

But anyway, long-running, stable standards are a Good Thing. Just look at MIDI!
posted by grumpybear69 at 7:31 AM on May 18, 2017 [3 favorites]


Really? Web pages are in XML, seems like something there would be libraries for everywhere.

Not quite. HTML can be treated as a subset of XML, but it predates XML and it uses a very limited set of predefined tags. The whole point of XML is that valid tags and their semantic meaning is whatever the document schema says they are. Whereas the set of valid HTML tags and their semantic meaning is a single, well-documented standard... that we still can't implement with 100% consistency across all browsers.

Also, not for nothing, XML intends to be a generalised format for describing structured data. HTML is essentially a set of rendering instructions for web browsers. The modern web likes to treat HTML documents as structured data that gets turned into rendering instructions by a browser's JavaScript VM, but this is a big reason why the modern web sucks.

Ultimately, though, XML does have libraries everywhere. They're just really, really heavyweight libraries that usually carry with them a ton of dependencies and are slow as hell to run. This limits the places you can use them. JSON, precisely because it doesn't try to do most of what XML is doing, can be parsed and deserialised by fairly simple, lightweight libraries.
posted by tobascodagama at 8:51 AM on May 18, 2017 [2 favorites]


Just for the less technical people, here's the difference between parsing json and parsing rss in python:
>>> import feedparser
>>> feed=feedparser.parse('http://feeds.feedburner.com/Metafilter')
>>> feed.entries[0]
{'summary_detail': {'base': u'http://feeds.feedburner.com/Metafilter', ...}

>>> import requests
>>> feed=requests.get('https://jsonfeed.org/feed.json').json()
>>> feed['items'][0]
{u'url': u'https://jsonfeed.org/2017/05/17/announcing_json_feed',... }
Not really a big difference in terms of ease of use.
posted by empath at 9:44 AM on May 18, 2017 [2 favorites]


HTML can be treated as a subset of XML

Oh your optimism is delightful! The biggest problem with parsing HTML web pages is dealing with broken formats and tags. Even in the days when HTML was really supposed to be a lot like well formed XML (really, SGML) you couldn't actually parse it that way.

Same thing with RSS feeds, btw, you can't rely on them to be well formed XML. Most blog engines manage to fuck up quoting in either simple or complicated ways, so nothing is correct. empath's code sample up there is accurate, but only because feedparser is this amazing hand-written parser that has 100s of special cases for parsing broken RSS. (And an amazing test suite.) The JSON example by default is assuming the JSON is well formed. Which seems somewhat plausible, but we don't really know since there's not really any JSON feeds in the wild.
posted by Nelson at 11:06 AM on May 18, 2017 [2 favorites]


However, I think what really holds YAML back for API use is whitespace sensitivity. It makes YAML more human-readable and human-writable, which is a huge benefit for config files and the like. But it's less than ideal in a data-serialisation format.

What holds YAML back for API use is that arbitrary code execution by default is baked into the spec. YAML is not a wire format. We should not be doing anything that makes people think that deserializing someone else's YAML is okay.
posted by invitapriore at 12:48 PM on May 18, 2017 [3 favorites]


Brent Simmons has written about this effort here. Before NetNewsWire, he worked on Dave Winer's Frontier from 1996-2002, so he really has seen RSS all the way through and understands what he's doing.

It looks like he's rebuilding Frontier which is pretty neat. I was a heavy user back around that same time and have a soft spot for it.
posted by idb at 2:10 PM on May 18, 2017 [1 favorite]


Do you look at the incoming JSON, or does "bad JSON" just mean "the parser crashes in a way that I notice"?

Yes, as I wrote the parser.
posted by destructive cactus at 2:18 PM on May 18, 2017


I was advocating for this years ago, I called it Absurdly Simple Syndication to differentiate it from RSS (Really Simple Syndication).

Wow, I did the same thing. But I called mine Astonishingly Simple Syndication with Holistic Object Lexical Expression. What a coincidence!
posted by jwest at 8:03 PM on May 18, 2017


I'm sort of surprised this wasn't already a thing.

Same. I started looking for JSON feed readers right after Google reader plummeted into darkness and so many apps already had many RSS feels to their aggregation.

*nothing* - for what, four years?
posted by filtergik at 6:58 AM on May 19, 2017


« Older Bored but Traveling   |   An unusual reaction to a strong painkiller… Newer »


This thread has been archived and is closed to new comments