Suggested Improvements to JSON

by Michael Bolin, April 6, 2011

Let me start out by saying that, in general, I think that JSON is great. (After a blog post I read on Hacker News last month, I'm trying to be more positive.) As compared to XML, JSON has the following advantages that make it a joy to use:

More bytes are dedicated to data than markup.
Fundamental data types are represented directly.
There are no namespaces.

I could go on and on about how XML fails in those respects, but again, I'm trying to be more positive.

Because I enjoy using JSON so much, I have started to try to use it for more than rudimentary data transfer between client and server. Unfortunately, there are aspects of JSON that make it harder to use in certain situations where one might expect JSON to be an ideal solution. I believe that the following two principles are baked into the design of JSON, which are the source of my problems:

JSON is meant to communicate information only between machines, not between humans.
JSON should be ES3 compliant.

Let's explore some of the drawbacks of JSON that are consistent with these assumptions, and how they could be addressed.

No Official Support for Comments

Because JSON is more concise than XML, JSON is often a better format for data files that are maintained by hand. Examples include configuration files, as well as blobs of test data for web applications. For files such as these, it is convenient to be able to temporarily comment out bits of information (such as a configuration option, or an old test value in lieu of a new one). Further, if the file is to be maintained by humans, it is desirable to be able to include comments so that maintainers may communicate amongst one another without interfering with the data in the file. Obviously, none of these use cases apply when the only clients of JSON are computers, but why not make a small change to better accommodate human users by adding official support for comments?

For example, if your JSON data file is being parsed by a strict JSON parser (and you care about 80-column line widths), then your only option is to do something like this:

{
  "comment": [
    "Because // was dropped from the JSON specification, this is the best way ",
    "to comment a JSON config file while maintaining 80 column lines."
  ]
}

This adds a lot of extra work for a human without providing any real value. Further, this is only an option because the top-level structure in this JSON is a map: if it were an array of strings, then there would not be a place to inject a comment that could easily be distinguished from actual data.

This begs the question: why aren't comments officially supported in JSON? Interestingly, when Douglas Crockford originally introduced JSON, there was explicit support for C-style comments. He later dropped support for them in the specification, but also declared that a JSON decoder that accepts comments should be considered a valid JSON decoder.

This means that comments are not prohibited outright in JSON; however, you cannot depend on an arbitrary JSON parser to ignore them, either. This is consistent with section 4 of RFC 4627, which states: "A JSON parser MAY accept non-JSON forms or extensions." Therefore, Google's open-source JSON parser, which added support for JavaScript comments, is a valid JSON parser according to RFC 4627.

Unfortunately, although I may use Google's JSON parser (which accepts comments) for many of my own personal projects, I have no control over what the native JSON.parse() function in the browser accepts. Specifically, the ES5 spec for the implementation of JSON.parse() intentionally does not include the "MAY accept non-JSON forms or extensions" language from the RFC. Why? Presumably for security reasons. It would be a serious problem if some user agents implemented JSON.parse() such that it could have side effects. Then it would no longer be safe to use.

Because the primary place where JSON is parsed is in the browser, the omission of comment support in the RFC takes us from "your JSON parser can support comments, if you like" to "in practice, it is impractical to use comments in JSON because then you cannot send it to the browser without preprocessing it first."

Although I can understand how mandatory support for comments adds more work for those who create JSON parsers, the tax this passes on to all of the developers who work with JSON is far greater.

No Trailing Comma in Object or Array Literals

Most modern browsers allow for a trailing comma in array and object literals in JavaScript. Although support for the trailing comma was not mandated until ES5, browsers such as Chrome and Firefox have supported it for a long time:

// This map has a trailing comma at the end of its second entry.
// IE6 and IE7 cannot parse this JavaScript because they only recognize
// ES3, which did not mandate support for the trailing comma.
{
  "color": "#A00",
  "font-weight": "bold",
}

// This map does not have a trailing comma.
// It is valid in both ES3 and ES5.
{
  "margin": "2px",
  "padding": "3px"
}

Using the trailing comma is particularly convenient for developers who may modify the map in the course of development. As shown in the following example, commenting out the last entry in a map can inadvertently transform it into an object literal with a trailing comma:

// Commenting out the last line produces an object literal with a
// trailing comma.
{
  "margin": "2px",
  // "padding": "3px"
}

Obviously, the solution is to remove the comma in addition to commenting out the last entry, but developers often forget this step and are unpleasantly surprised when it brings Internet Explorer to a grinding halt. A similar error occurs (in both ES3 and ES5) when adding an entry to a map that does not have a trailing comma:

// This will yield a parse error in both ES3 and ES5 because the middle entry
// is missing a comma.
{
  "margin": "2px",
  "padding": "3px"
  "float": "left"
}

Again, the workaround is to update two lines instead of one when editing the set of entries in the map, but if developers could reliably include the trailing comma, modifying the map would be less error-prone.

Unfortunately, the JSON specification does not mandate support for the trailing comma. As demonstrated, denying the trailing comma makes it harder to update object and array literals by hand. Further, it would be easier to generate JSON programmatically if one could rely on the trailing comma, as loops that print out map or list members would always be able to print out the trailing comma without having to check whether the item were the last element in a collection. Again, we must ask: why doesn't JSON support this feature?

I believe that the primary reason that the trailing comma is not allowed is that an ES3-compliant JSON was initially easier to promote. Specifically, one of the selling points of JSON was that writing a JSON parser in JavaScript was trivial:

var parseJson = function(jsonString) {
  return eval('(' + jsonString + ')');
};

If the trailing comma were allowed, then this code would not work in Internet Explorer 6 and 7, in which case this would no longer be a selling point of JSON.

I find this disingenuous because Crockford has long argued that JSON should not be parsed in this manner, as it introduces a security risk. Instead, a proper library, such as his own json2.js should be used instead. My feelings are: if you are going to have to write a parser, write it for the input language you want, not the one you have. This blind adherence to ES3 creates an unnecessary burden on developers.

Mandatory Quoting of Keys in Object Literals

In ES3 JavaScript, there are primarily two reasons why you would quote the key for an object literal:

The key being used is a JavaScript keyword. Incidentally, this is what caused a failure in the first JSON message, which was: {to:"session", do:"test", text:"Hello world"}. Because do is a JavaScript keyword, the object literal did not parse in an ES3 browser (which was all there were, at the time).
The key contains spaces or characters that need to be escaped, such as a newline.

Often, JavaScript developers choose property names that do not need to be quoted because then they can reference them using the "dot" syntax:

var text = message.text;

As opposed to the slightly more verbose "bracket" syntax:

var text = message["text"];

Unfortunately, even though quoting is the minority case, JSON requires that all keys in maps must be double-quoted, regardless of whether they would need to be in ordinary JavaScript. Presumably this was done because it was the simplest way to guarantee that JSON would be a strict subset of ES3. (Fortunately, ES5 has evolved to allow JavaScript keywords to serve as unquoted property names in object literals.)

Similar to the situation with trailing commas, if the design of JSON were not encumbered by the shortcomings of ES3, then I imagine that JSON keys would not have to be quoted. Perhaps things would have even gone one step further, limiting key names to the following regular expression which would preclude the need quoted keys altogether: /[a-zA-Z_$][a-zA-Z0-9_$]+/ (though that would exclude unicode characters, which are allowed in JavaScript identifiers today). In either case, demoting quoting from a requirement to an option would save most developers two bytes per key, which would be a win for both humans and machines. (I also think that it would make JSON more readable, though that may be a personal preference.)

Conclusions

JSON is a vast improvement over XML for many data-interchange use cases. However, there are some aspects of its design that make it harder to use for data that are maintained by humans. I argue that these design choices were made due to a strong emphasis in making JSON a subset of ES3. Instead, it would have been better to look forward to what developers wanted JavaScript to be. Now that ES5 has been finalized, I think that it would be best to expand JSON to include the following features:

Support for C-style comments.
Optional trailing comma at the end of object and array literals.
Optional quoting of keys in object literals when the key matches /[a-zA-Z_$][a-zA-Z0-9_$]+/.

This would make JSON a proper subset of ES5, but not a proper subset of ES3. Admittedly, it would place an additional burden on those who create JSON parsers, but they are an incredibly small minority compared to those who must work with JSON. I believe that these changes would allow the use of JSON to extend beyond its current, more limited, set of use cases. Long live JSON!