<< Back to previous view

[DJSON-3] Error writing strings with characters outside the BMP Created: 16/Dec/11  Updated: 09/Mar/12  Resolved: 07/Mar/12

Status: Resolved
Project: data.json
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Matthew Phillips Assignee: Stuart Sierra
Resolution: Completed Votes: 1
Labels: None
Environment:

Clojure 1.2 and 1.3, Mac OS X, Java 1.6


Attachments: Text File djson-3-non-bmp-chars-patch.txt     Text File fix_unicode_non_bmp_2.patch     Text File fix_unicode_non_bmp_3.patch     Text File fix_unicode_non_bmp.patch    
Patch: Code and Test

 Description   

[This ticket filed as a follow on from pull request 2 at github]

One of my users decided to start sending Emoji happy faces over a network protocol encoded using data.json, which caused it to promptly roll over and die. It turned out that write-json-string, while aware of the code point vs. character issue, was still iterating over a character count not a code point count.

I have demonstrated the problem in a test, and fixed this: will file a patch when my contributor agreement is in place.



 Comments   
Comment by Matthew Phillips [ 05/Jan/12 5:57 PM ]

Patch to fix DJSON-3.

Comment by Andy Fingerhut [ 13/Jan/12 1:29 AM ]

I'm not an expert on JSON, so feel free to correct me if I am wrong, but it seems from reading RFC 4627 that strings in JSON, if they are encoded with the \uxxxx format for hex digits xxxx, should have their 16-bit UTF-16 code units encoded in that way. There should always be exactly 4 hex digits after the \u.

Here are a couple of test cases with the current code, without either of the patches fix_unicode_non_bmp.patch or fix_unicode_non_bmp_2.patch:

user=> (in-ns 'clojure.data.json)
#<Namespace clojure.data.json>

;; Incorrect.
clojure.data.json=> (map #(format "%x" (int %)) (json-str "abc\ud834\udd1e" :escape-unicode false))
("22" "61" "62" "63" "d834" "dd1e" "dd1e" "22")

;; Also incorrect.
clojure.data.json=> (json-str "abc\ud834\udd1e")
"\"abc\\u1d11e\\udd1e\""

Here is the behavior with patch fix_unicode_non_bmp.patch:

; This is correct
clojure.data.json=> (map #(format "%x" (int %)) (json-str "abc\ud834\udd1e" :escape-unicode false))
("22" "61" "62" "63" "d834" "dd1e" "22")

;; but this is not. JSON specifies that UTF-16 code units should be
;; included in string in form \uxxxx, for hex encoding xxxx of 16-bit
;; code units.
clojure.data.json=> (json-str "abc\ud834\udd1e")
"\"abc\\u1d11e\""

Here is the behavior with patch fix_unicode_non_bmp_2.patch:

;; correct
clojure.data.json=> (map #(format "%x" (int %)) (json-str "abc\ud834\udd1e" :escape-unicode false))
("22" "61" "62" "63" "d834" "dd1e" "22")

;; also correct.
clojure.data.json=> (json-str "abc\ud834\udd1e")
"\"abc\\ud834\\udd1e\""

Comment by Andy Fingerhut [ 13/Jan/12 3:18 PM ]

fix_unicode_non_bmp_3.patch is same as fix_unicode_non_bmp_2.patch, except it includes the unit test Matthew Phillips had in his patch, plus another one to test the :escape-unicode true case.

Comment by Andy Fingerhut [ 20/Feb/12 4:03 PM ]

Eventually I may even get the hang of this. djson-3-non-bmp-chars-patch.txt contains patch in proper format.

Comment by Stuart Sierra [ 07/Mar/12 7:58 AM ]

patch applied.

Comment by Matthew Phillips [ 08/Mar/12 6:39 PM ]

Hi Stuart, it looks like you've only applied Andy Fingerhut's last patch. My fix is in the first one: "fix_unicode_non_bmp.patch"

Comment by Andy Fingerhut [ 09/Mar/12 11:04 AM ]

Matthew, did you see my comment from Jan 12, 2012? I show a test case there that appears not to work with your patch, unless I am misunderstanding the correct JSON desired behavior. That same test does work with my latest patch, I think. Take a look and see what you think.

Comment by Matthew Phillips [ 09/Mar/12 7:41 PM ]

Oh I see - I misunderstood what your were doing there, thinking it only applied to the escape-unicode == true case.

And of course the Clojure string is already in UTF-16, so there's no reason to turn it into 32 bit code points and then expand it back to UTF-16 pairs.





[DJSON-5] data.json 0.2.0 must provide 0.1.x API compatibility layer Created: 24/Oct/12  Updated: 27/Oct/12  Resolved: 27/Oct/12

Status: Resolved
Project: data.json
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: Michael Klishin Assignee: Stuart Sierra
Resolution: Completed Votes: 0
Labels: None


 Description   

For libraries that rely on data.json the 0.2 release was a major unexpected breakage. For example, for
Monger to support 3 JSON serializers (Cheshire, c.d.j 0.1.x, c.d.j 0.2.x) we had to do this:
https://github.com/clojurewerkz/support/blob/master/src/clojure/clojurewerkz/support/json.clj

This is some of the craziest code I've seen. While writing it, I realized that several changes
in the c.d.j. API were easy to shim with a backwards-compatibility function. clojure.data.json/json-str
can be implemented on top of the new function easily, for example.

clojure.data.json demonstrates very questionable library maintenance practices and the 0.2 release
already cased a lot of confusion and pain to the community. There was 0 communication about the changes upfront, this shows lack of respect to the community and care about backwards compatibility.

So, bringing back a shim API layer for 0.1.x compatibility is a must. There are many codebases on clojure.data.json 0.1.x and their developers probably are not going to stop doing what they are doing
and deal with unnecessary c.d.j. API changes. Other library maintainers that extend or depend on c.d.j.
are caught between a rock and a hard place.



 Comments   
Comment by Michael Klishin [ 24/Oct/12 4:07 PM ]

According to clojuresphere.com, data.json is relied on by 340+ projects: http://www.clojuresphere.com/org.clojure/data.json

Comment by Tim McCormack [ 24/Oct/12 4:11 PM ]

Alternative suggestion: Mark the library as "under development/API subject to change" while < v1.0.

Comment by Michael Klishin [ 24/Oct/12 4:12 PM ]

@Tim McCormack: if a project is used by over 300 other projects and has been around for about a year, it is just lame to use excuses like that. It is too late, too many people already
use c.d.j.

Comment by Stuart Sierra [ 27/Oct/12 1:09 PM ]

Old APIs restored, documented as "DEPRECATED", in release 0.2.1.





[DJSON-10] data.json does not AOT Created: 29/Apr/13  Updated: 29/Apr/13  Resolved: 29/Apr/13

Status: Resolved
Project: data.json
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Major
Reporter: David Nolen Assignee: Unassigned
Resolution: Declined Votes: 0
Labels: None


 Description   

When trying to AOT data.json I get the following exception:

CompilerException java.lang.ClassNotFoundException: 
clojure.data.json.JSONWriter, compiling:(clojure/data/json.clj:279:1)

I encountered this problem while looking into the possibility of AOTing the ClojureScript compiler to avoid startup time issues. ClojureScript now depends on data.json to help with source maps - so this was the first AOT hurdle I encountered.

The exact steps I take to reproduce the bug:

  • fresh checkout of ClojureScript
  • ./script/bootstrap
  • mkdir classes
  • ./script/repl
  • (compile 'clojure.data.json)


 Comments   
Comment by David Nolen [ 29/Apr/13 12:27 PM ]

Not a bug





[DJSON-7] Write-str outputs invalid JSON when key/value pairs are removed Created: 08/Jan/13  Updated: 02/Apr/13  Resolved: 02/Apr/13

Status: Resolved
Project: data.json
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Fred Caton Assignee: Stuart Sierra
Resolution: Completed Votes: 1
Labels: None
Environment:

clojure.data.json 0.2.0


Attachments: Text File djson-7-dont-write-unnecessary-commas-v1.txt    
Patch: Code and Test

 Description   

When key/value pairs are removed by the function defined for :value-fn, commas are still output and this results in invalid JSON. To remove the errant commas, I've had to wrap write-str in (write-str (read-str ())).

Here is a simple example from the REPL:

main=> (defn remove-nils [k v]
#_=> (if (nil? v)
#_=> remove-nils
#_=> v))

main=> (def test
#_=> {:a nil,
#_=> :b nil,
#_=> :c 1,
#_=> :d nil,
#_=> :e 2
#_=> :f nil})

main=> (json/write-str test :value-fn remove-nils)
;=>"{,\"c\":1,,,\"e\":2,}"

main=> (json/write-str (json/read-str (json/write-str test :value-fn remove-nils)))
;=>"{\"c\":1,\"e\":2}"



 Comments   
Comment by Andy Fingerhut [ 22/Jan/13 12:29 AM ]

djson-7-dont-write-unnecessary-commas-v1.txt dated Jan 21 2013 modifies write-object so that it only writes out a comma if :value-fn does not cause the key/value pair to be omitted.

Comment by Stuart Sierra [ 02/Apr/13 7:05 PM ]

Patch applied.





[DJSON-8] write-json is documented to write to arg out, but instead writes to *out* Created: 27/Mar/13  Updated: 02/Apr/13  Resolved: 02/Apr/13

Status: Resolved
Project: data.json
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Defect Priority: Minor
Reporter: Andy Fingerhut Assignee: Stuart Sierra
Resolution: Completed Votes: 0
Labels: None

Attachments: Text File djson-8-correct-write-json-patch-v1.txt    

 Description   

The summary pretty much says it all.



 Comments   
Comment by Andy Fingerhut [ 27/Mar/13 6:50 PM ]

Patch djson-8-correct-write-json-patch-v1.txt dated Mar 27 2013 is a simple one-line fix.

Comment by Stuart Sierra [ 02/Apr/13 7:06 PM ]

Already fixed in commit d3cce5b200031cf22603c13a9a39e3939651d344





Generated at Wed May 22 21:39:43 CDT 2013 using JIRA 4.4#649-r158309.