<< Back to previous view

[DJSON-18] Fast way to print indented json Created: 15/Dec/14  Updated: 15/Dec/14

Status: Open
Project: data.json
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Enhancement Priority: Major
Reporter: Nikita Prokopov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File djson_18_fast_indent.patch    

 Description   

Hi!

Formatted json is very handy for human consumption, for example, while debugging or exploring JSON API. data.json offers formatting in a form of pprint-json. Problem is, pprint-json is dead slow because it tries to fit everything within some line width limit. In practice it takes 20-100 times more time to use pprint-json instead of write-str, up to the point where it just cannot be used in production:

clojure.data.json=> (def data (read-string (slurp "sample.edn")))
#'clojure.data.json/data
clojure.data.json=> (count data)
4613
clojure.data.json=> (time (do (clojure.data.json/write-str data) nil))
"Elapsed time: 219.33 msecs"
clojure.data.json=> (time (do (with-out-str (clojure.data.json/pprint-json data)) nil))
"Elapsed time: 25271.549 msecs"

Proposed enhancement is very simple: indent new keys and array elements, but do not try to fit values into line width limit. For human, JSON formatted this way is still easy consumable, structure is evident. The only downside is that some lines might become very long.

In a patch attached, I modified write-array and write-object, added new :indent option to write. To print indented json, one can write now: (write-str data :indent true)

There's some performance penalty, of course, but relatively small:

clojure.data.json=> (time (do (clojure.data.json/write-str data :indent true) nil))
"Elapsed time: 250.18 msecs"

I also fixed small bug: (seq m) thing in write-object should be (seq x).






[DJSON-20] data.json reads and writes invalid JSON Created: 14/May/15  Updated: 17/May/15

Status: Open
Project: data.json
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Tim Visher Assignee: Stuart Sierra
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Clojure 1.6 (et. al. I'm assuming)



 Description   

data.json silently reads and writes invalid JSON which pushes error checking out to consumers.

http://www.json.org/ states that JSON must be an object or an array composed of values. http://jsonlint.com/ follows this pattern.

> (json/write-str 1)
"1"
> (json/write-str nil)
"null"
> (json/write-str "charnock")
"\"charnock\""

Putting any of the above into http://jsonlint.com/ results in an error because jsonlint at least interprets the grammar to necessitate top level values being objects or arrays.

Sadly, Chrome at least seems to do the same thing that data.json is doing here, which I've come to understand as 'non-strict JSON'. I think at the very least, it would be helpful to provide a 'strict' option, as apparently aeson (the Haskell JSON library) does, to pick your poison. I do think it would make sense to default to strict though, as that should be the most widely consumable format, and to then make the choice to drop to non-strict if you're cool with making that contract with all your consumers.

I'm really not sure whether this is a bug or not. Probably more of an enhancement, especially since Chrome at least seems willing to parse each of the above. I'm thinking though of other clients that were written like jsonlint was to assume that JSON is always an object or array at the top level.



 Comments   
Comment by Stuart Sierra [ 15/May/15 7:33 AM ]

What are the invalid syntaxes which data.json reads or writes?

Comment by Tim Visher [ 15/May/15 7:52 AM ]

I updated the description with more detail.

Comment by Stuart Sierra [ 17/May/15 5:04 AM ]

The documentation for the Haskell Aeson library states "the JSON standard requires that the top-level value be either an array or an object."

http://jsonlint.com/ reports JSON as "invalid" if it does not begin with '{' or '['.

However, I find no evidence for this assertion in JSON.org or ECMA-404. Neither specifies the valid top-level entry points for a JSON "document." Both define a JSON "value" as any one of string, number, object, array, true, false, or null.

RFC 4627 states "A JSON text is a serialized object or array," but it is superseded by RFC 7159.

RFC 7159 states: "A JSON text is a serialized value. Note that certain previous specifications of JSON constrained a JSON text to be an object or an array. Implementations that generate only objects or arrays where a JSON text is called for will be interoperable in the sense that all implementations will accept these as conforming JSON texts."

Since there may be applications which do consider strings and numbers to be valid JSON texts, it would be an unnecessary limitation on data.json to disallow it.

If you want your application to accept/produce only JSON objects or arrays, such an assertion is trivial to write. Adding a "strict" option to data.json offers little or no value in comparison.





Generated at Mon May 25 08:55:16 CDT 2015 using JIRA 4.4#649-r158309.