[CLJ-1766] Serializing+deserializing lists breaks their hash Created: 21/Jun/15 Updated: 04/Aug/15
|Affects Version/s:||Release 1.6, Release 1.7|
|Fix Version/s:||Release 1.8|
|Reporter:||Chris Vermilion||Assignee:||Andrew Rosa|
Tested on OS X 10.10.3, Clojure 1.6.0, Java 1.8.0_45-b14, but don't think this is a factor.
|Attachments:||CLJ-1766.patch serialization_test_mod.diff serialize-test.clj|
clojure.lang.PersistentList implements Serializable but a serialization roundtrip results in a hash of 0. This appears to be caused by ASeq marking its _hash property as transient; other collection base classes don't do this. I don't know if there is a rational for the distinction, but the result is that serializing, then deserializing, a list results in its _hash property being 0 instead of either its previous, calculated value, or -1, which would trigger recalculation.
This means that any code that relies on a list's hash value can break in subtle ways.
Approach: Use 0 as the "uncomputed" hash value, not -1. The auto initialized value on serialization construction will then automatically trigger a re-computation.
Note: Also need to consider other classes that use -1 instead of 0 as the uncomputed hash value: APersistentMap, APersistentSet, APersistentVector, PersistentQueue - however I believe in those cases they are not transient and thus may avoid the issue. Perhaps they should be transient though.
|Comment by Chris Vermilion [ 21/Jun/15 1:10 PM ]|
Yikes, sorry about the formatting above; JIRA newbie and looks like I can't edit. Also, I should have noted that the above functions require an import: (import '[java.io ByteArrayInputStream ByteArrayOutputStream ObjectInputStream ObjectOutputStream]).
|Comment by Alex Miller [ 22/Jun/15 7:55 AM ]|
Yes, this is a bug. My preference would be to use 0 to indicate "not computed" and thus sidestep this issue. The downside of changing from -1 to 0 is that serialization/deserialization is then broken between versions of Clojure (although maybe if it was already broken, that's not an issue). -1 is used like this for lazily computed hashes in many classes in Clojure so the issue should really be fixed across the board.
|Comment by Alex Miller [ 22/Jun/15 8:22 AM ]|
There are some serialization round-trip tests in clojure.test-clojure.serialization - seems like those should be updated to compare the .hashCode and hash before/after, which should catch this. I attached a diff (not a proper patch) with that change just as a demonstration, which indeed produces a bunch of failures. That should be incorporated into any fix, possibly along with other failures.
|Comment by Alex Miller [ 31/Jul/15 9:04 AM ]|
Also, don't worry about crediting me on that test change, please just incorporate it into whatever gets made here.
|Comment by Andrew Rosa [ 31/Jul/15 9:31 AM ]|
Alex Miller I hope to work on this issue this weekend. There are some conflict with the alpha release schedule?
|Comment by Alex Miller [ 31/Jul/15 9:47 AM ]|
No conflict - when it's ready we'll look at it.
|Comment by Andrew Rosa [ 04/Aug/15 8:21 AM ]|
As Alex Miller recomended, I followed the change-default-to-zero path.
So, together with the patch I've made some manual tests between some versions. The script used is attached for further inspection.
Running on a 1.6 REPL has shown on the original described issue:
Running on 1.8 master. Despite of the = behavior looking ok, the
Running on 1.8 after patch the hashes are correctly computed on both cases:
Please note I've dumped each serialized data into different files, so I could test the interoperability between those versions. What I found:
Did I miss something or everything looks correct for you too? I'm open to do some more exhaustive testing if anyone could give me some directions about where to explore.