Clojure

SeqIterator can return incorrect results

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Blocker Blocker
  • Resolution: Completed
  • Affects Version/s: Release 1.7
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Labels:
  • Environment:
    clojure-1.7.0-alpha5
  • Patch:
    Code
  • Approval:
    Ok

Description

As of 1.7.0-alpha5, we are seeing SeqIterator return iterated results that do reflect the values of the underlying seq, in particular acting as if the seq contains a nil value when it does not. This problem is intermittent but has at times caused clojure master to fail in compilation (which is why this is marked as a blocker).

Two recent changes during 1.7 have created and exposed this problem:

1) This commit https://github.com/clojure/clojure/commit/43cc1854508d655e58e377f84836ba128971f90c changed the SeqIterator implementation to be lazier and to use "this" as a sentinel object in SeqIterator. (1.7.0-alpha2)
2) CLJ-1546 changed the implementation of vec such that PersistentHashMap and PersistentHashSet are now converted using iterator() rather than seq(). PHS/PHM use SeqIterator for their Iterator implementation. (1.7.0-alpha5)

Because of #2, we are now stressing #1 much more than before. In particular, things like defining defrecords rely heavily on vec (and set) of PHS and PHM.

Example stack trace: https://gist.github.com/puredanger/f56e3253f0668a515ec5 (seen compiling Clojure itself)

Cause: Setting seq==this; in the constructor of SeqIterator is allowing unsafe publication of the partially constructed "this" object, which can cause subtle problems in the hasNext() implementation. In particular, it seems that after inlining, on the first call, the seq==this condition when comparing the cached partially constructed instance in seq and the fully constructed version in this will return false, even though these have the same object identity. This causes the wrong path to be executed in hasNext().

Approach: Do not use this as a sentinel value.

Patch: 0001-CLJ-1636-don-t-use-this-as-a-sentinel-in-SeqIterator.patch

Screened by: Alex Miller

Activity

Hide
Colin Jones added a comment -

I was able to reproduce this (intermittently) earlier, but I've seen periods of many successful runs in a row (both with that patch reverted and with it in place), so it's been hard for me to trust what I'm seeing locally when it passes. I didn't see any evidence of AOT compilation happening (e.g. no classfiles under `target/`), so I'd have expected the new function `already-compiled?` in CLJ-1544 never to actually run.

It looks like the Cause section of the stacktrace is implicating an error in trying to `(resolve nil)`, where `nil` is an entry in an interfaces collection that should actually be empty. That's based on these two lines (along with the lines higher up in the cause):

...snip...
at clojure.core$set.invoke(core.clj:3944)
at clojure.core$emit_defrecord.invoke(core_deftype.clj:154)
...snip...

(https://github.com/clojure/clojure/blob/3e7cb1a5c840612ad41cf6e0be92480f798bc05d/src/clj/clojure/core_deftype.clj#L154)

The defrecord here in question looks like

(defrecord Foo [x y])

So the `opts+specs` var-arg argument to `defrecord` should be `nil` since there are no entries, which should mean the `interfaces` piece of the `parse-opts+specs` call should return an empty vector.

But that stacktrace confuses me, because it suggests that the `interfaces` vector, instead of being empty, contains a `nil` element. How can this be? Or what misstep have I made in tracing through this?

Show
Colin Jones added a comment - I was able to reproduce this (intermittently) earlier, but I've seen periods of many successful runs in a row (both with that patch reverted and with it in place), so it's been hard for me to trust what I'm seeing locally when it passes. I didn't see any evidence of AOT compilation happening (e.g. no classfiles under `target/`), so I'd have expected the new function `already-compiled?` in CLJ-1544 never to actually run. It looks like the Cause section of the stacktrace is implicating an error in trying to `(resolve nil)`, where `nil` is an entry in an interfaces collection that should actually be empty. That's based on these two lines (along with the lines higher up in the cause):
...snip...
at clojure.core$set.invoke(core.clj:3944)
at clojure.core$emit_defrecord.invoke(core_deftype.clj:154)
...snip...
(https://github.com/clojure/clojure/blob/3e7cb1a5c840612ad41cf6e0be92480f798bc05d/src/clj/clojure/core_deftype.clj#L154) The defrecord here in question looks like
(defrecord Foo [x y])
So the `opts+specs` var-arg argument to `defrecord` should be `nil` since there are no entries, which should mean the `interfaces` piece of the `parse-opts+specs` call should return an empty vector. But that stacktrace confuses me, because it suggests that the `interfaces` vector, instead of being empty, contains a `nil` element. How can this be? Or what misstep have I made in tracing through this?
Hide
Alex Miller added a comment -

If the error is intermittent, then my pegging of CLJ-1544 may be wrong. For me, it was repeatable as of clojure commit e5a104e894ed82f244d69513918d570cee5df67d (when CLJ-1544 was applied) and I have not reproduced it prior.

Show
Alex Miller added a comment - If the error is intermittent, then my pegging of CLJ-1544 may be wrong. For me, it was repeatable as of clojure commit e5a104e894ed82f244d69513918d570cee5df67d (when CLJ-1544 was applied) and I have not reproduced it prior.
Hide
Nicola Mometto added a comment - - edited

Alex, just to be sure – you were able to reproduce this bug with clojure at e5a104e894ed82f244d69513918d570cee5df67d (CLJ-1544) ? I'd like to have confirmation so 9f277c80258b3d2951128ce26a07c30ad0b47af0 (CLJ-979) can be excluded as the culprit

Show
Nicola Mometto added a comment - - edited Alex, just to be sure – you were able to reproduce this bug with clojure at e5a104e894ed82f244d69513918d570cee5df67d (CLJ-1544) ? I'd like to have confirmation so 9f277c80258b3d2951128ce26a07c30ad0b47af0 (CLJ-979) can be excluded as the culprit
Hide
Alex Miller added a comment -

Correct.

Show
Alex Miller added a comment - Correct.
Hide
Nicola Mometto added a comment -

Well this is weird then.
The only way I can think of that would produce that exception is if this returned nil: https://github.com/clojure/clojure/blob/3e7cb1a5c840612ad41cf6e0be92480f798bc05d/src/clj/clojure/core_deftype.clj#L57

This means a scenario like this:

user=> (defrecord x [] +)
NullPointerException   clojure.lang.Compiler.maybeResolveIn (Compiler.java:7015)
user=> (.printStackTrace *e)
java.lang.NullPointerException, compiling:(NO_SOURCE_FILE:3:1)
	at clojure.lang.Compiler.macroexpand1(Compiler.java:6620)
	at clojure.lang.Compiler.macroexpand(Compiler.java:6678)
	at clojure.lang.Compiler.eval(Compiler.java:6752)
	at clojure.lang.Compiler.eval(Compiler.java:6731)
	at clojure.core$eval.invoke(core.clj:3076)
	at clojure.main$repl$read_eval_print__7044$fn__7047.invoke(main.clj:239)
	at clojure.main$repl$read_eval_print__7044.invoke(main.clj:239)
	at clojure.main$repl$fn__7053.invoke(main.clj:257)
	at clojure.main$repl.doInvoke(main.clj:257)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.main$repl_opt.invoke(main.clj:323)
	at clojure.main$main.doInvoke(main.clj:421)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at clojure.lang.Var.invoke(Var.java:375)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.lang.NullPointerException
	at clojure.lang.Compiler.maybeResolveIn(Compiler.java:7015)
	at clojure.core$ns_resolve.invoke(core.clj:4200)
	at clojure.core$ns_resolve.invoke(core.clj:4197)
	at clojure.core$resolve.invoke(core.clj:4206)
	at clojure.core$map$fn__4523.invoke(core.clj:2612)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:485)
	at clojure.core$seq__4103.invoke(core.clj:135)
	at clojure.core$reduce1.invoke(core.clj:899)
	at clojure.core$set.invoke(core.clj:3944)
	at clojure.core$emit_defrecord.invoke(core_deftype.clj:154)
	at clojure.core$defrecord.doInvoke(core_deftype.clj:374)
	at clojure.lang.RestFn.invoke(RestFn.java:497)
	at clojure.lang.Var.invoke(Var.java:401)
	at clojure.lang.AFn.applyToHelper(AFn.java:171)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.lang.Compiler.macroexpand1(Compiler.java:6607)
	... 16 more
nil

where a var is used as a protocol but no interface is present.

Show
Nicola Mometto added a comment - Well this is weird then. The only way I can think of that would produce that exception is if this returned nil: https://github.com/clojure/clojure/blob/3e7cb1a5c840612ad41cf6e0be92480f798bc05d/src/clj/clojure/core_deftype.clj#L57 This means a scenario like this:
user=> (defrecord x [] +)
NullPointerException   clojure.lang.Compiler.maybeResolveIn (Compiler.java:7015)
user=> (.printStackTrace *e)
java.lang.NullPointerException, compiling:(NO_SOURCE_FILE:3:1)
	at clojure.lang.Compiler.macroexpand1(Compiler.java:6620)
	at clojure.lang.Compiler.macroexpand(Compiler.java:6678)
	at clojure.lang.Compiler.eval(Compiler.java:6752)
	at clojure.lang.Compiler.eval(Compiler.java:6731)
	at clojure.core$eval.invoke(core.clj:3076)
	at clojure.main$repl$read_eval_print__7044$fn__7047.invoke(main.clj:239)
	at clojure.main$repl$read_eval_print__7044.invoke(main.clj:239)
	at clojure.main$repl$fn__7053.invoke(main.clj:257)
	at clojure.main$repl.doInvoke(main.clj:257)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.main$repl_opt.invoke(main.clj:323)
	at clojure.main$main.doInvoke(main.clj:421)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at clojure.lang.Var.invoke(Var.java:375)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.lang.NullPointerException
	at clojure.lang.Compiler.maybeResolveIn(Compiler.java:7015)
	at clojure.core$ns_resolve.invoke(core.clj:4200)
	at clojure.core$ns_resolve.invoke(core.clj:4197)
	at clojure.core$resolve.invoke(core.clj:4206)
	at clojure.core$map$fn__4523.invoke(core.clj:2612)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:485)
	at clojure.core$seq__4103.invoke(core.clj:135)
	at clojure.core$reduce1.invoke(core.clj:899)
	at clojure.core$set.invoke(core.clj:3944)
	at clojure.core$emit_defrecord.invoke(core_deftype.clj:154)
	at clojure.core$defrecord.doInvoke(core_deftype.clj:374)
	at clojure.lang.RestFn.invoke(RestFn.java:497)
	at clojure.lang.Var.invoke(Var.java:401)
	at clojure.lang.AFn.applyToHelper(AFn.java:171)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.lang.Compiler.macroexpand1(Compiler.java:6607)
	... 16 more
nil
where a var is used as a protocol but no interface is present.
Hide
Nicola Mometto added a comment - - edited

Alex, since I cannot reproduce, can you try getting the exception with a patched version of clojure that replaces https://github.com/clojure/clojure/blob/3e7cb1a5c840612ad41cf6e0be92480f798bc05d/src/clj/clojure/core_deftype.clj#L57
with something like

(or (:on (deref (resolve %))) 
    (println % @(resolve %)))

so we can get an idea of what's going on?

Show
Nicola Mometto added a comment - - edited Alex, since I cannot reproduce, can you try getting the exception with a patched version of clojure that replaces https://github.com/clojure/clojure/blob/3e7cb1a5c840612ad41cf6e0be92480f798bc05d/src/clj/clojure/core_deftype.clj#L57 with something like
(or (:on (deref (resolve %))) 
    (println % @(resolve %)))
so we can get an idea of what's going on?
Hide
Nicola Mometto added a comment -

I was just able to reproduce this issue using clojure at commit 4afd4a7c14c48b5baf3c03196053066483cb4223

This means that CLJ-1544 is not responsable for this bug.

I can also confirm that this bug is intermittent, which makes figuring out what's going on really hard.

Show
Nicola Mometto added a comment - I was just able to reproduce this issue using clojure at commit 4afd4a7c14c48b5baf3c03196053066483cb4223 This means that CLJ-1544 is not responsable for this bug. I can also confirm that this bug is intermittent, which makes figuring out what's going on really hard.
Hide
Nicola Mometto added a comment -

I still have absolutely no idea how this can happen but adding a bunch of printlns it turned out that for some reason in this binding of the deftype macro:

[interfaces methods opts] (parse-opts+specs opts+specs)

when opts+specs is nil, interfaces is sometimes [nil] as opposed to [].

This makes me think that there's some concurrency bug in the recent changes around the handling of vec, but this is just a guess.

Show
Nicola Mometto added a comment - I still have absolutely no idea how this can happen but adding a bunch of printlns it turned out that for some reason in this binding of the deftype macro:
[interfaces methods opts] (parse-opts+specs opts+specs)
when opts+specs is nil, interfaces is sometimes [nil] as opposed to []. This makes me think that there's some concurrency bug in the recent changes around the handling of vec, but this is just a guess.
Hide
Nicola Mometto added a comment - - edited

I've restricted it down a bit and it looks like this part of opts+spec can bind interfaces to [nil] when impls is {}

interfaces (→ (map #(if (var? (resolve %))
                      (:on (deref (resolve %)))
                      %)
                   (keys impls))
             set
             (disj 'Object 'java.lang.Object)
             vec)
Show
Nicola Mometto added a comment - - edited I've restricted it down a bit and it looks like this part of opts+spec can bind interfaces to [nil] when impls is {}
interfaces (→ (map #(if (var? (resolve %))
                      (:on (deref (resolve %)))
                      %)
                   (keys impls))
             set
             (disj 'Object 'java.lang.Object)
             vec)
Hide
Nicola Mometto added a comment -

Here's an example output from my debugging tests, with the following patch applied:

diff --git a/src/clj/clojure/core_deftype.clj b/src/clj/clojure/core_deftype.clj
index 97e14cc..8f521eb 100644
--- a/src/clj/clojure/core_deftype.clj
+++ b/src/clj/clojure/core_deftype.clj
@@ -60,6 +60,8 @@ (defn- parse-opts+specs [opts+specs]
                        set
                        (disj 'Object 'java.lang.Object)
                        vec)
+        _ (when (nil? opts+specs)
+            (println impls interfaces))
         methods (map (fn [[name params & body]]
                        (cons name (maybe-destructured params body)))
                      (apply concat (vals impls)))]
{} [nil]
Exception in thread "main" java.lang.NullPointerException, compiling:(schema/utils.clj:68:1)
	at clojure.lang.Compiler.analyzeSeq(Compiler.java:6716)
	at clojure.lang.Compiler.analyze(Compiler.java:6500)
	at clojure.lang.Compiler.analyze(Compiler.java:6461)
	at clojure.lang.Compiler$BodyExpr$Parser.parse(Compiler.java:5837)
	at clojure.lang.Compiler$LetExpr$Parser.parse(Compiler.java:6155)
	at clojure.lang.Compiler.analyzeSeq(Compiler.java:6709)
	at clojure.lang.Compiler.analyze(Compiler.java:6500)
	at clojure.lang.Compiler.analyze(Compiler.java:6461)
	at clojure.lang.Compiler$BodyExpr$Parser.parse(Compiler.java:5837)
	at clojure.lang.Compiler$FnMethod.parse(Compiler.java:5272)
	at clojure.lang.Compiler$FnExpr.parse(Compiler.java:3901)
	at clojure.lang.Compiler.analyzeSeq(Compiler.java:6707)
	at clojure.lang.Compiler.analyze(Compiler.java:6500)
	at clojure.lang.Compiler.eval(Compiler.java:6765)
	at clojure.lang.Compiler.load(Compiler.java:7195)
	at clojure.lang.RT.loadResourceScript(RT.java:370)
	at clojure.lang.RT.loadResourceScript(RT.java:361)
	at clojure.lang.RT.load(RT.java:440)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5424.invoke(core.clj:5848)
	at clojure.core$load.doInvoke(core.clj:5847)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5653)
	at clojure.core$load_lib$fn__5373.invoke(core.clj:5693)
	at clojure.core$load_lib.doInvoke(core.clj:5692)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5731)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5814)
	at clojure.lang.RestFn.invoke(RestFn.java:457)
	at plumbing.core$eval13998$loading__5316__auto____13999.invoke(core.clj:1)
	at plumbing.core$eval13998.invoke(core.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:6768)
	at clojure.lang.Compiler.eval(Compiler.java:6757)
	at clojure.lang.Compiler.load(Compiler.java:7195)
	at clojure.lang.RT.loadResourceScript(RT.java:370)
	at clojure.lang.RT.loadResourceScript(RT.java:361)
	at clojure.lang.RT.load(RT.java:440)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5424.invoke(core.clj:5848)
	at clojure.core$load.doInvoke(core.clj:5847)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5653)
	at clojure.core$load_lib$fn__5373.invoke(core.clj:5693)
	at clojure.core$load_lib.doInvoke(core.clj:5692)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5731)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5814)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at user.fus_threading$eval13994.invoke(fus_threading.clj:6)
	at clojure.lang.Compiler.eval(Compiler.java:6768)
	at clojure.lang.Compiler.load(Compiler.java:7195)
	at clojure.lang.RT.loadResourceScript(RT.java:370)
	at clojure.lang.RT.loadResourceScript(RT.java:361)
	at clojure.lang.RT.load(RT.java:440)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5424.invoke(core.clj:5848)
	at clojure.core$load.doInvoke(core.clj:5847)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5653)
	at clojure.core$load_lib$fn__5373.invoke(core.clj:5693)
	at clojure.core$load_lib.doInvoke(core.clj:5692)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5731)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5814)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at midje.repl$load_facts$fn__6148.invoke(repl.clj:206)
	at midje.repl$load_facts.doInvoke(repl.clj:192)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at user$eval6211.invoke(form-init7109545842773565024.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:6768)
	at clojure.lang.Compiler.eval(Compiler.java:6758)
	at clojure.lang.Compiler.load(Compiler.java:7195)
	at clojure.lang.Compiler.loadFile(Compiler.java:7151)
	at clojure.main$load_script.invoke(main.clj:274)
	at clojure.main$init_opt.invoke(main.clj:279)
	at clojure.main$initialize.invoke(main.clj:307)
	at clojure.main$null_opt.invoke(main.clj:342)
	at clojure.main$main.doInvoke(main.clj:420)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.lang.Var.invoke(Var.java:383)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.lang.NullPointerException
	at clojure.lang.Compiler.resolveIn(Compiler.java:6971)
	at clojure.lang.Compiler.resolve(Compiler.java:6949)
	at clojure.lang.Compiler$NewInstanceExpr.build(Compiler.java:7565)
	at clojure.lang.Compiler$NewInstanceExpr$DeftypeParser.parse(Compiler.java:7490)
	at clojure.lang.Compiler.analyzeSeq(Compiler.java:6709)
	... 91 more
Show
Nicola Mometto added a comment - Here's an example output from my debugging tests, with the following patch applied:
diff --git a/src/clj/clojure/core_deftype.clj b/src/clj/clojure/core_deftype.clj
index 97e14cc..8f521eb 100644
--- a/src/clj/clojure/core_deftype.clj
+++ b/src/clj/clojure/core_deftype.clj
@@ -60,6 +60,8 @@ (defn- parse-opts+specs [opts+specs]
                        set
                        (disj 'Object 'java.lang.Object)
                        vec)
+        _ (when (nil? opts+specs)
+            (println impls interfaces))
         methods (map (fn [[name params & body]]
                        (cons name (maybe-destructured params body)))
                      (apply concat (vals impls)))]
{} [nil]
Exception in thread "main" java.lang.NullPointerException, compiling:(schema/utils.clj:68:1)
	at clojure.lang.Compiler.analyzeSeq(Compiler.java:6716)
	at clojure.lang.Compiler.analyze(Compiler.java:6500)
	at clojure.lang.Compiler.analyze(Compiler.java:6461)
	at clojure.lang.Compiler$BodyExpr$Parser.parse(Compiler.java:5837)
	at clojure.lang.Compiler$LetExpr$Parser.parse(Compiler.java:6155)
	at clojure.lang.Compiler.analyzeSeq(Compiler.java:6709)
	at clojure.lang.Compiler.analyze(Compiler.java:6500)
	at clojure.lang.Compiler.analyze(Compiler.java:6461)
	at clojure.lang.Compiler$BodyExpr$Parser.parse(Compiler.java:5837)
	at clojure.lang.Compiler$FnMethod.parse(Compiler.java:5272)
	at clojure.lang.Compiler$FnExpr.parse(Compiler.java:3901)
	at clojure.lang.Compiler.analyzeSeq(Compiler.java:6707)
	at clojure.lang.Compiler.analyze(Compiler.java:6500)
	at clojure.lang.Compiler.eval(Compiler.java:6765)
	at clojure.lang.Compiler.load(Compiler.java:7195)
	at clojure.lang.RT.loadResourceScript(RT.java:370)
	at clojure.lang.RT.loadResourceScript(RT.java:361)
	at clojure.lang.RT.load(RT.java:440)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5424.invoke(core.clj:5848)
	at clojure.core$load.doInvoke(core.clj:5847)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5653)
	at clojure.core$load_lib$fn__5373.invoke(core.clj:5693)
	at clojure.core$load_lib.doInvoke(core.clj:5692)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5731)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5814)
	at clojure.lang.RestFn.invoke(RestFn.java:457)
	at plumbing.core$eval13998$loading__5316__auto____13999.invoke(core.clj:1)
	at plumbing.core$eval13998.invoke(core.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:6768)
	at clojure.lang.Compiler.eval(Compiler.java:6757)
	at clojure.lang.Compiler.load(Compiler.java:7195)
	at clojure.lang.RT.loadResourceScript(RT.java:370)
	at clojure.lang.RT.loadResourceScript(RT.java:361)
	at clojure.lang.RT.load(RT.java:440)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5424.invoke(core.clj:5848)
	at clojure.core$load.doInvoke(core.clj:5847)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5653)
	at clojure.core$load_lib$fn__5373.invoke(core.clj:5693)
	at clojure.core$load_lib.doInvoke(core.clj:5692)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5731)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5814)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at user.fus_threading$eval13994.invoke(fus_threading.clj:6)
	at clojure.lang.Compiler.eval(Compiler.java:6768)
	at clojure.lang.Compiler.load(Compiler.java:7195)
	at clojure.lang.RT.loadResourceScript(RT.java:370)
	at clojure.lang.RT.loadResourceScript(RT.java:361)
	at clojure.lang.RT.load(RT.java:440)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5424.invoke(core.clj:5848)
	at clojure.core$load.doInvoke(core.clj:5847)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5653)
	at clojure.core$load_lib$fn__5373.invoke(core.clj:5693)
	at clojure.core$load_lib.doInvoke(core.clj:5692)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5731)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5814)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at midje.repl$load_facts$fn__6148.invoke(repl.clj:206)
	at midje.repl$load_facts.doInvoke(repl.clj:192)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at user$eval6211.invoke(form-init7109545842773565024.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:6768)
	at clojure.lang.Compiler.eval(Compiler.java:6758)
	at clojure.lang.Compiler.load(Compiler.java:7195)
	at clojure.lang.Compiler.loadFile(Compiler.java:7151)
	at clojure.main$load_script.invoke(main.clj:274)
	at clojure.main$init_opt.invoke(main.clj:279)
	at clojure.main$initialize.invoke(main.clj:307)
	at clojure.main$null_opt.invoke(main.clj:342)
	at clojure.main$main.doInvoke(main.clj:420)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.lang.Var.invoke(Var.java:383)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.lang.NullPointerException
	at clojure.lang.Compiler.resolveIn(Compiler.java:6971)
	at clojure.lang.Compiler.resolve(Compiler.java:6949)
	at clojure.lang.Compiler$NewInstanceExpr.build(Compiler.java:7565)
	at clojure.lang.Compiler$NewInstanceExpr$DeftypeParser.parse(Compiler.java:7490)
	at clojure.lang.Compiler.analyzeSeq(Compiler.java:6709)
	... 91 more
Hide
Nicola Mometto added a comment - - edited

Further debugging is convincing me further that some of the recent changes around `vec` are causing this bug.

With the following patch:

diff --git a/src/clj/clojure/core_deftype.clj b/src/clj/clojure/core_deftype.clj
index 97e14cc..9478b04 100644
--- a/src/clj/clojure/core_deftype.clj
+++ b/src/clj/clojure/core_deftype.clj
@@ -53,13 +53,16 @@ (defn- parse-impls [specs]
 (defn- parse-opts+specs [opts+specs]
   (let [[opts specs] (parse-opts opts+specs)
         impls (parse-impls specs)
-        interfaces (-> (map #(if (var? (resolve %)) 
-                               (:on (deref (resolve %)))
-                               %)
-                            (keys impls))
-                       set
-                       (disj 'Object 'java.lang.Object)
-                       vec)
+        ks (keys impls)
+        interfaces' (map #(if (var? (resolve %))
+                            (:on (deref (resolve %)))
+                            %)
+                         ks)
+        interfaces'' (set interfaces')
+        interfaces''' (disj interfaces'' 'Object 'java.lang.Object)
+        interfaces (vec interfaces''')
+        _ (when (nil? opts+specs)
+            (println impls ks interfaces' interfaces'' interfaces''' interfaces))
         methods (map (fn [[name params & body]]
                        (cons name (maybe-destructured params body)))
                      (apply concat (vals impls)))]

I get this println when the NPE occurs:

{} nil () #{} #{} [nil]

Meaning that for some reson, `vec` of `#{}` returns `[nil]` rather than `[]`

Show
Nicola Mometto added a comment - - edited Further debugging is convincing me further that some of the recent changes around `vec` are causing this bug. With the following patch:
diff --git a/src/clj/clojure/core_deftype.clj b/src/clj/clojure/core_deftype.clj
index 97e14cc..9478b04 100644
--- a/src/clj/clojure/core_deftype.clj
+++ b/src/clj/clojure/core_deftype.clj
@@ -53,13 +53,16 @@ (defn- parse-impls [specs]
 (defn- parse-opts+specs [opts+specs]
   (let [[opts specs] (parse-opts opts+specs)
         impls (parse-impls specs)
-        interfaces (-> (map #(if (var? (resolve %)) 
-                               (:on (deref (resolve %)))
-                               %)
-                            (keys impls))
-                       set
-                       (disj 'Object 'java.lang.Object)
-                       vec)
+        ks (keys impls)
+        interfaces' (map #(if (var? (resolve %))
+                            (:on (deref (resolve %)))
+                            %)
+                         ks)
+        interfaces'' (set interfaces')
+        interfaces''' (disj interfaces'' 'Object 'java.lang.Object)
+        interfaces (vec interfaces''')
+        _ (when (nil? opts+specs)
+            (println impls ks interfaces' interfaces'' interfaces''' interfaces))
         methods (map (fn [[name params & body]]
                        (cons name (maybe-destructured params body)))
                      (apply concat (vals impls)))]
I get this println when the NPE occurs:
{} nil () #{} #{} [nil]
Meaning that for some reson, `vec` of `#{}` returns `[nil]` rather than `[]`
Hide
Nicola Mometto added a comment - - edited

I confirmed that the bug is in the vec function.
With the following patch, when the NPE occurs, the debug println is triggered:

diff --git a/src/jvm/clojure/lang/PersistentVector.java b/src/jvm/clojure/lang/PersistentVector.java
index 9804a0b..a460b6f 100644
--- a/src/jvm/clojure/lang/PersistentVector.java
+++ b/src/jvm/clojure/lang/PersistentVector.java
@@ -96,19 +96,22 @@ static public PersistentVector create(ArrayList list){
 static public PersistentVector create(Iterable items){
     // optimize common case
     if(items instanceof ArrayList)
         return create((ArrayList)items);

     Iterator iter = items.iterator();
     TransientVector ret = EMPTY.asTransient();
     while(iter.hasNext())
         ret = ret.conj(iter.next());
-    return ret.persistent();
+    PersistentVector r = ret.persistent();
+    if (RT.seq(r) != null && RT.seq(items) == null)
+        System.out.println("bug");
+    return r;
 }

 static public PersistentVector create(Object... items){
        TransientVector ret = EMPTY.asTransient();
        for(Object item : items)
                ret = ret.conj(item);
        return ret.persistent();
 }
Show
Nicola Mometto added a comment - - edited I confirmed that the bug is in the vec function. With the following patch, when the NPE occurs, the debug println is triggered:
diff --git a/src/jvm/clojure/lang/PersistentVector.java b/src/jvm/clojure/lang/PersistentVector.java
index 9804a0b..a460b6f 100644
--- a/src/jvm/clojure/lang/PersistentVector.java
+++ b/src/jvm/clojure/lang/PersistentVector.java
@@ -96,19 +96,22 @@ static public PersistentVector create(ArrayList list){
 static public PersistentVector create(Iterable items){
     // optimize common case
     if(items instanceof ArrayList)
         return create((ArrayList)items);

     Iterator iter = items.iterator();
     TransientVector ret = EMPTY.asTransient();
     while(iter.hasNext())
         ret = ret.conj(iter.next());
-    return ret.persistent();
+    PersistentVector r = ret.persistent();
+    if (RT.seq(r) != null && RT.seq(items) == null)
+        System.out.println("bug");
+    return r;
 }

 static public PersistentVector create(Object... items){
        TransientVector ret = EMPTY.asTransient();
        for(Object item : items)
                ret = ret.conj(item);
        return ret.persistent();
 }
Hide
Alex Miller added a comment -

And items is a PersistentSet? I've actually been looking at some weirdness on set iterators in the context of CLJ-1499 in consistency between seq and iterators.

Show
Alex Miller added a comment - And items is a PersistentSet? I've actually been looking at some weirdness on set iterators in the context of CLJ-1499 in consistency between seq and iterators.
Hide
Nicola Mometto added a comment - - edited

I believe I have identified the bug, but I cannot make any sense out of it.

The bug apperas to be in SeqIterator.hasNext(), when the NPE occurs, after applying the following patch:

diff --git a/src/jvm/clojure/lang/SeqIterator.java b/src/jvm/clojure/lang/SeqIterator.java
index e6ad481..031fbc8 100644
--- a/src/jvm/clojure/lang/SeqIterator.java
+++ b/src/jvm/clojure/lang/SeqIterator.java
@@ -35,14 +35,18 @@ public SeqIterator(ISeq o){
 public boolean hasNext(){
        if(seq == this){
                seq = START;
                next = RT.seq(next);
                }
        else if(seq == next)
                next = RT.next(seq);
+    else if (RT.seq(next) == null)
+        System.out.println("this shouldn't happen: " + (this == seq));
+    if (RT.seq(next) == null && next != null)
+        System.out.println("bug: " + next);
        return next != null;
 }

 public Object next() throws NoSuchElementException {
        if(!hasNext())
                throw new NoSuchElementException();
        seq = next;

I get the following output:

this shouldn't happen: true
bug: #{}

I have absolutely no idea how it is possible that the last branch gets executed since it is true that seq == this thus the first branch should have been executed.

Show
Nicola Mometto added a comment - - edited I believe I have identified the bug, but I cannot make any sense out of it. The bug apperas to be in SeqIterator.hasNext(), when the NPE occurs, after applying the following patch:
diff --git a/src/jvm/clojure/lang/SeqIterator.java b/src/jvm/clojure/lang/SeqIterator.java
index e6ad481..031fbc8 100644
--- a/src/jvm/clojure/lang/SeqIterator.java
+++ b/src/jvm/clojure/lang/SeqIterator.java
@@ -35,14 +35,18 @@ public SeqIterator(ISeq o){
 public boolean hasNext(){
        if(seq == this){
                seq = START;
                next = RT.seq(next);
                }
        else if(seq == next)
                next = RT.next(seq);
+    else if (RT.seq(next) == null)
+        System.out.println("this shouldn't happen: " + (this == seq));
+    if (RT.seq(next) == null && next != null)
+        System.out.println("bug: " + next);
        return next != null;
 }

 public Object next() throws NoSuchElementException {
        if(!hasNext())
                throw new NoSuchElementException();
        seq = next;
I get the following output:
this shouldn't happen: true
bug: #{}
I have absolutely no idea how it is possible that the last branch gets executed since it is true that seq == this thus the first branch should have been executed.
Hide
Nicola Mometto added a comment - - edited

I don't know why, but with the attached patch the bug seems to go away.
This is probably just by accident though as I have no idea what changes between the code pre patch and the code post patch.

Show
Nicola Mometto added a comment - - edited I don't know why, but with the attached patch the bug seems to go away. This is probably just by accident though as I have no idea what changes between the code pre patch and the code post patch.
Hide
Alex Miller added a comment -

Without looking at the patch Id say that non deterministic bug plus impossible state smells like a concurrency / race condition problem.

Show
Alex Miller added a comment - Without looking at the patch Id say that non deterministic bug plus impossible state smells like a concurrency / race condition problem.
Hide
Michael Blume added a comment -

This isn't the bug, per se, the thing I'm describing should not break anything, but why is the PersistentVector(Iterable) constructor being called on a PersistentHashSet? It looks like we could very easily do

--- i/src/jvm/clojure/lang/LazilyPersistentVector.java
+++ w/src/jvm/clojure/lang/LazilyPersistentVector.java
@@ -26,7 +26,7 @@ static public IPersistentVector createOwning(Object... items){
 static public IPersistentVector create(Object obj){
    if(obj instanceof IReduceInit)
        return PersistentVector.create((IReduceInit) obj);
-   else if(obj instanceof ISeq)
+   else if(obj instanceof Seqable)
        return PersistentVector.create(RT.seq(obj));
    else if(obj instanceof Iterable)
        return PersistentVector.create((Iterable)obj);

and treat the set directly as a seq. Is there some way that would be slower?

Show
Michael Blume added a comment - This isn't the bug, per se, the thing I'm describing should not break anything, but why is the PersistentVector(Iterable) constructor being called on a PersistentHashSet? It looks like we could very easily do
--- i/src/jvm/clojure/lang/LazilyPersistentVector.java
+++ w/src/jvm/clojure/lang/LazilyPersistentVector.java
@@ -26,7 +26,7 @@ static public IPersistentVector createOwning(Object... items){
 static public IPersistentVector create(Object obj){
    if(obj instanceof IReduceInit)
        return PersistentVector.create((IReduceInit) obj);
-   else if(obj instanceof ISeq)
+   else if(obj instanceof Seqable)
        return PersistentVector.create(RT.seq(obj));
    else if(obj instanceof Iterable)
        return PersistentVector.create((Iterable)obj);
and treat the set directly as a seq. Is there some way that would be slower?
Hide
Michael Blume added a comment - - edited

It may be that I've just tried it an insufficient number of times, but simply adding 'synchronized' to SeqIterator.hasNext appears to solve the problem. Again, this doesn't really tell us what the problem is.

ETA: Nope, fails sometimes even with synchronized.

Show
Michael Blume added a comment - - edited It may be that I've just tried it an insufficient number of times, but simply adding 'synchronized' to SeqIterator.hasNext appears to solve the problem. Again, this doesn't really tell us what the problem is. ETA: Nope, fails sometimes even with synchronized.
Hide
Alex Miller added a comment -

The use of ISeq and not Seqable in LazilyPersistentVector is quite intentional. The idea here is that if something is already a seq (effectively a linked list), the best we're going to do is walk that chain. However, many things are Seqable that may have more efficient Iterable implementations which can (statefully) walk a data structure without creating all the intermediate objects required by seq. In particular, via CLJ-1499, map and set will soon be gaining direct Iterable implementations that walk the persistent tree without instantiating a seq object for every element. However, at the moment set and map will create a SeqIterator wrapped around the seq.

CLJ-1546 changed this path - it was walking the seq but is now walking it via SeqIterator. My working theory is that that switch has uncovered a latent race condition in SeqIterator that was never noticed before as the path wasn't exercised.

Note that because CLJ-1499 removes the reliance on SeqIterator, it would have avoided the bug in a different way! However, I have been seeing a number of weird things while doing dev on CLJ-1499 specifically around iterating over sets - the OO around iterator() and seq() in the APersistentSet/PersistentHashSet/PersistentTreeSet has some weird interactions.

I'm going to look a little closer at the suggested patch.

Show
Alex Miller added a comment - The use of ISeq and not Seqable in LazilyPersistentVector is quite intentional. The idea here is that if something is already a seq (effectively a linked list), the best we're going to do is walk that chain. However, many things are Seqable that may have more efficient Iterable implementations which can (statefully) walk a data structure without creating all the intermediate objects required by seq. In particular, via CLJ-1499, map and set will soon be gaining direct Iterable implementations that walk the persistent tree without instantiating a seq object for every element. However, at the moment set and map will create a SeqIterator wrapped around the seq. CLJ-1546 changed this path - it was walking the seq but is now walking it via SeqIterator. My working theory is that that switch has uncovered a latent race condition in SeqIterator that was never noticed before as the path wasn't exercised. Note that because CLJ-1499 removes the reliance on SeqIterator, it would have avoided the bug in a different way! However, I have been seeing a number of weird things while doing dev on CLJ-1499 specifically around iterating over sets - the OO around iterator() and seq() in the APersistentSet/PersistentHashSet/PersistentTreeSet has some weird interactions. I'm going to look a little closer at the suggested patch.
Hide
Nicola Mometto added a comment - - edited

Alex, before you waste your time on my patch I want to clarify that I don't think that patch fixes the issue in any way, it's just a random change I made that happens to make the symptoms disappear on my system, I just attached it for debugging purposed.

Maybe there's a reason why the patch solves the bug or maybe it's just masking it, I still can't figure out why this apparent race condition happens.

Show
Nicola Mometto added a comment - - edited Alex, before you waste your time on my patch I want to clarify that I don't think that patch fixes the issue in any way, it's just a random change I made that happens to make the symptoms disappear on my system, I just attached it for debugging purposed. Maybe there's a reason why the patch solves the bug or maybe it's just masking it, I still can't figure out why this apparent race condition happens.
Hide
Alex Miller added a comment -

Gotcha, thx.

Show
Alex Miller added a comment - Gotcha, thx.
Hide
Michael Blume added a comment -

Aha, makes sense, thanks =)

Show
Michael Blume added a comment - Aha, makes sense, thanks =)
Hide
Andy Fingerhut added a comment -

I haven't dug into this, and don't have a solution, but SeqIterator's fields are not final, so there is no guarantee that the values assigned to a new instance's fields in its constructors will be visible to other threads, yes? And I believe that if those writes to the fields do eventually become visible, they need not become visible in the order that the assignments occur in the source code, but can become visible in any order.

Show
Andy Fingerhut added a comment - I haven't dug into this, and don't have a solution, but SeqIterator's fields are not final, so there is no guarantee that the values assigned to a new instance's fields in its constructors will be visible to other threads, yes? And I believe that if those writes to the fields do eventually become visible, they need not become visible in the order that the assignments occur in the source code, but can become visible in any order.
Hide
Nicola Mometto added a comment -

By the way, I've been able to reproduce this bug using jdk 1.8 so it's not just with 1.7

Show
Nicola Mometto added a comment - By the way, I've been able to reproduce this bug using jdk 1.8 so it's not just with 1.7
Hide
Michael Blume added a comment -

SeqIterator seems to use 'this' as a sentinel value. If I replace it with an explicit 'new Object' sentinel, the problem appears to go away (~40 compilations without an NPE).

Making seq and next volatile doesn't help.

Interestingly, when I synchronize the entire SeqIterator class (both hasNext and next synchronized on this), the problem doesn't go away, so if this is a race condition, it's kind of a weird one.

I can insert a call to seq before the call to set here https://github.com/clojure/clojure/blob/clojure-1.7.0-alpha5/src/clj/clojure/core_deftype.clj#L60 and the problem doesn't go away. I can then print the result of seq before it's passed to set, and, of course, it's a nil.

So somehow, we're basically evaluating (-> nil set (disj 'Object 'java.lang.Object) vec) and getting [nil] instead of []

But when I actually evaluate that expression in a REPL (from 20 threads at once, 1M times in a row) it evaluates to empty vector every time.

So I'm confused.

Debug patch here if anyone wants to check my work: https://gist.githubusercontent.com/MichaelBlume/735c8f601210cfa1ecaf/raw/814f21e5e4abb2ca9d1d5330d0b4cc2b3a4424e6/gistfile1.txt

Show
Michael Blume added a comment - SeqIterator seems to use 'this' as a sentinel value. If I replace it with an explicit 'new Object' sentinel, the problem appears to go away (~40 compilations without an NPE). Making seq and next volatile doesn't help. Interestingly, when I synchronize the entire SeqIterator class (both hasNext and next synchronized on this), the problem doesn't go away, so if this is a race condition, it's kind of a weird one. I can insert a call to seq before the call to set here https://github.com/clojure/clojure/blob/clojure-1.7.0-alpha5/src/clj/clojure/core_deftype.clj#L60 and the problem doesn't go away. I can then print the result of seq before it's passed to set, and, of course, it's a nil. So somehow, we're basically evaluating (-> nil set (disj 'Object 'java.lang.Object) vec) and getting [nil] instead of [] But when I actually evaluate that expression in a REPL (from 20 threads at once, 1M times in a row) it evaluates to empty vector every time. So I'm confused. Debug patch here if anyone wants to check my work: https://gist.githubusercontent.com/MichaelBlume/735c8f601210cfa1ecaf/raw/814f21e5e4abb2ca9d1d5330d0b4cc2b3a4424e6/gistfile1.txt
Hide
Alex Miller added a comment -

I'm with you. I'm starting to suspect that this is involved:

https://github.com/clojure/clojure/commit/43cc1854508d655e58e377f84836ba128971f90c

Show
Alex Miller added a comment - I'm with you. I'm starting to suspect that this is involved: https://github.com/clojure/clojure/commit/43cc1854508d655e58e377f84836ba128971f90c
Hide
Andy Fingerhut added a comment -

Out of curiosity, I tried adding a field to the SeqIterator class that remembers the thread that constructed each instance, and then checks in the call to hasNext if the calling thread is the same as the creator thread, printing a message if they are ever different. I never saw that while building Clojure, nor running the command on the Midje project. That seems to rule out the possibility of the SeqIterator getting passed from one thread to another.

If that is always true, then I'm with Nicola: I would love an explanation of what is going on here to cause the debug print's he mentions in a comment above to print what they do when a failure occurs. It looks really, really wrong, as in wrong single-threaded program behavior.

Show
Andy Fingerhut added a comment - Out of curiosity, I tried adding a field to the SeqIterator class that remembers the thread that constructed each instance, and then checks in the call to hasNext if the calling thread is the same as the creator thread, printing a message if they are ever different. I never saw that while building Clojure, nor running the command on the Midje project. That seems to rule out the possibility of the SeqIterator getting passed from one thread to another. If that is always true, then I'm with Nicola: I would love an explanation of what is going on here to cause the debug print's he mentions in a comment above to print what they do when a failure occurs. It looks really, really wrong, as in wrong single-threaded program behavior.
Hide
Nicola Mometto added a comment -

Michael's and Andy's findings agree with what I observed. There's no multithreading involved yet somehow there's what appears to be a data race condition in SeqIterator.

Also, as Michael observed in his own testings, changing the sentinel from this to an Object instance (START) makes the issue go away, this is exactly what the patch I attached does.

I am also unable to reproduce this issue by repeatedly invoking vec on an empty set.

Show
Nicola Mometto added a comment - Michael's and Andy's findings agree with what I observed. There's no multithreading involved yet somehow there's what appears to be a data race condition in SeqIterator. Also, as Michael observed in his own testings, changing the sentinel from this to an Object instance (START) makes the issue go away, this is exactly what the patch I attached does. I am also unable to reproduce this issue by repeatedly invoking vec on an empty set.
Hide
Nicola Mometto added a comment - - edited

This just happened:

public boolean hasNext(){
    Object a = seq;
    Object b = this;
	if(seq == this){
		seq = START;
		next = RT.seq(next);
		}
	else {
        if (RT.seq(next) == null) {
            System.out.println (a);
            System.out.println (b);
        }
        if(seq == next)
            next = RT.next(seq);
    }
	return next != null;
}
clojure.lang.SeqIterator@3ddc6873
clojure.lang.SeqIterator@3ddc6873
Exception in thread "main" java.util.NoSuchElementException, compiling:(utils.clj:68:1)
Show
Nicola Mometto added a comment - - edited This just happened:
public boolean hasNext(){
    Object a = seq;
    Object b = this;
	if(seq == this){
		seq = START;
		next = RT.seq(next);
		}
	else {
        if (RT.seq(next) == null) {
            System.out.println (a);
            System.out.println (b);
        }
        if(seq == next)
            next = RT.next(seq);
    }
	return next != null;
}
clojure.lang.SeqIterator@3ddc6873
clojure.lang.SeqIterator@3ddc6873
Exception in thread "main" java.util.NoSuchElementException, compiling:(utils.clj:68:1)
Hide
Nicola Mometto added a comment - - edited

I'm taking a shot in the dark but I tried reproducing this bug using -Xint (using the jvm interpreter) and I can't seem to be able to reproduce it after many runs.
As absurd as this sounds, I'm starting to think that some hotspot optimization is responsible for this nonsensical behaviour – this would explain the nondeterministic behaviour in the absence of multithreading.

Show
Nicola Mometto added a comment - - edited I'm taking a shot in the dark but I tried reproducing this bug using -Xint (using the jvm interpreter) and I can't seem to be able to reproduce it after many runs. As absurd as this sounds, I'm starting to think that some hotspot optimization is responsible for this nonsensical behaviour – this would explain the nondeterministic behaviour in the absence of multithreading.
Hide
Alex Miller added a comment - - edited

I did some experiments myself yesterday and found many of the same things listed here - single-threaded usage with seemingly impossible results.

I am currently thinking that setting seq = this; in the constructor is unsafe publication of this. "this" is not valid until after the constructor completes, yet we have saved away a reference to it in a field.

Thus seq==this may possibly return false in hasNext() because it is comparing a partially constructed object with the fully constructed object (same object identity!). It may be that this doesn't happen until after hot spot/inlining. By turning on the inline diagnostics, I do see these SeqIterator methods as a hot spot that gets inlined at some point soon before I see the error manifest.

This is my best explanation of the results we're seeing and seems sufficient justification to change the implementation of SeqIterator to me.

Show
Alex Miller added a comment - - edited I did some experiments myself yesterday and found many of the same things listed here - single-threaded usage with seemingly impossible results. I am currently thinking that setting seq = this; in the constructor is unsafe publication of this. "this" is not valid until after the constructor completes, yet we have saved away a reference to it in a field. Thus seq==this may possibly return false in hasNext() because it is comparing a partially constructed object with the fully constructed object (same object identity!). It may be that this doesn't happen until after hot spot/inlining. By turning on the inline diagnostics, I do see these SeqIterator methods as a hot spot that gets inlined at some point soon before I see the error manifest. This is my best explanation of the results we're seeing and seems sufficient justification to change the implementation of SeqIterator to me.
Hide
Nicola Mometto added a comment -

From the Java Language Spec, section 15.8.3:
"The keyword this may be used only in the body of an instance method or default method, or in the body of a constructor of a class, or in an instance initializer of a class, or in the initializer of an instance variable of a class"

So I don't believe that the current usage of this is incorrect for this reason, however I agree that it's likely caused by some interaction of doing identity check with this and hot spot/inlining issue so I'm more confident that the patch I posted is a right fix for this bug.

Show
Nicola Mometto added a comment - From the Java Language Spec, section 15.8.3: "The keyword this may be used only in the body of an instance method or default method, or in the body of a constructor of a class, or in an instance initializer of a class, or in the initializer of an instance variable of a class" So I don't believe that the current usage of this is incorrect for this reason, however I agree that it's likely caused by some interaction of doing identity check with this and hot spot/inlining issue so I'm more confident that the patch I posted is a right fix for this bug.
Hide
Andy Fingerhut added a comment -

The only other scenario of multiple calls to next / hasNext that I could think of with a single thread is if some method call inside of hasNext causes next / hasNext to be called, nested, on the same SeqIterator instance. I did some instrumentation to check for nested calls, unlikely as that would seem from the source code, and saw a failure with no nested call to hasNext.

Show
Andy Fingerhut added a comment - The only other scenario of multiple calls to next / hasNext that I could think of with a single thread is if some method call inside of hasNext causes next / hasNext to be called, nested, on the same SeqIterator instance. I did some instrumentation to check for nested calls, unlikely as that would seem from the source code, and saw a failure with no nested call to hasNext.
Hide
Alex Miller added a comment -

Midje stack trace (removed from original ticket, left here for safe-keeping). To reproduce:

// set your JAVA_HOME and PATH to JDK 1.7
git clone git@github.com:marick/Midje.git
cd Midje
git co e98cf87
lein with-profile 1.7 midje

Error:

Exception in thread "main" java.lang.NullPointerException, compiling:(fim_collection_diffs.clj:7:1)
	at clojure.lang.Compiler.macroexpand1(Compiler.java:6619)
	at clojure.lang.Compiler.macroexpand(Compiler.java:6677)
	at clojure.lang.Compiler.eval(Compiler.java:6751)
	at clojure.lang.Compiler.load(Compiler.java:7194)
	at clojure.lang.RT.loadResourceScript(RT.java:370)
	at clojure.lang.RT.loadResourceScript(RT.java:361)
	at clojure.lang.RT.load(RT.java:440)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5436.invoke(core.clj:5863)
	at clojure.core$load.doInvoke(core.clj:5862)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5653)
	at clojure.core$load_lib$fn__5383.invoke(core.clj:5708)
	at clojure.core$load_lib.doInvoke(core.clj:5707)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5746)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5829)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at midje.repl$load_facts$fn__6148.invoke(repl.clj:206)
	at midje.repl$load_facts.doInvoke(repl.clj:192)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at user$eval6211.invoke(form-init3965655274254111851.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:6767)
	at clojure.lang.Compiler.eval(Compiler.java:6757)
	at clojure.lang.Compiler.load(Compiler.java:7194)
	at clojure.lang.Compiler.loadFile(Compiler.java:7150)
	at clojure.main$load_script.invoke(main.clj:274)
	at clojure.main$init_opt.invoke(main.clj:279)
	at clojure.main$initialize.invoke(main.clj:307)
	at clojure.main$null_opt.invoke(main.clj:342)
	at clojure.main$main.doInvoke(main.clj:420)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.lang.Var.invoke(Var.java:383)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.lang.NullPointerException
	at clojure.lang.Compiler.maybeResolveIn(Compiler.java:7014)
	at clojure.core$ns_resolve.invoke(core.clj:4200)
	at clojure.core$ns_resolve.invoke(core.clj:4197)
	at clojure.core$resolve.invoke(core.clj:4206)
	at clojure.core$map$fn__4529.invoke(core.clj:2612)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:485)
	at clojure.core$seq__4109.invoke(core.clj:135)
	at clojure.core$reduce1.invoke(core.clj:899)
	at clojure.core$set.invoke(core.clj:3944)
	at clojure.core$emit_defrecord.invoke(core_deftype.clj:154)
	at clojure.core$defrecord.doInvoke(core_deftype.clj:374)
	at clojure.lang.RestFn.invoke(RestFn.java:470)
	at clojure.lang.Var.invoke(Var.java:394)
	at clojure.lang.AFn.applyToHelper(AFn.java:165)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.lang.Compiler.macroexpand1(Compiler.java:6606)
Show
Alex Miller added a comment - Midje stack trace (removed from original ticket, left here for safe-keeping). To reproduce:
// set your JAVA_HOME and PATH to JDK 1.7
git clone git@github.com:marick/Midje.git
cd Midje
git co e98cf87
lein with-profile 1.7 midje
Error:
Exception in thread "main" java.lang.NullPointerException, compiling:(fim_collection_diffs.clj:7:1)
	at clojure.lang.Compiler.macroexpand1(Compiler.java:6619)
	at clojure.lang.Compiler.macroexpand(Compiler.java:6677)
	at clojure.lang.Compiler.eval(Compiler.java:6751)
	at clojure.lang.Compiler.load(Compiler.java:7194)
	at clojure.lang.RT.loadResourceScript(RT.java:370)
	at clojure.lang.RT.loadResourceScript(RT.java:361)
	at clojure.lang.RT.load(RT.java:440)
	at clojure.lang.RT.load(RT.java:411)
	at clojure.core$load$fn__5436.invoke(core.clj:5863)
	at clojure.core$load.doInvoke(core.clj:5862)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at clojure.core$load_one.invoke(core.clj:5653)
	at clojure.core$load_lib$fn__5383.invoke(core.clj:5708)
	at clojure.core$load_lib.doInvoke(core.clj:5707)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$load_libs.doInvoke(core.clj:5746)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at clojure.core$apply.invoke(core.clj:628)
	at clojure.core$require.doInvoke(core.clj:5829)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at midje.repl$load_facts$fn__6148.invoke(repl.clj:206)
	at midje.repl$load_facts.doInvoke(repl.clj:192)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at user$eval6211.invoke(form-init3965655274254111851.clj:1)
	at clojure.lang.Compiler.eval(Compiler.java:6767)
	at clojure.lang.Compiler.eval(Compiler.java:6757)
	at clojure.lang.Compiler.load(Compiler.java:7194)
	at clojure.lang.Compiler.loadFile(Compiler.java:7150)
	at clojure.main$load_script.invoke(main.clj:274)
	at clojure.main$init_opt.invoke(main.clj:279)
	at clojure.main$initialize.invoke(main.clj:307)
	at clojure.main$null_opt.invoke(main.clj:342)
	at clojure.main$main.doInvoke(main.clj:420)
	at clojure.lang.RestFn.invoke(RestFn.java:421)
	at clojure.lang.Var.invoke(Var.java:383)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.main.main(main.java:37)
Caused by: java.lang.NullPointerException
	at clojure.lang.Compiler.maybeResolveIn(Compiler.java:7014)
	at clojure.core$ns_resolve.invoke(core.clj:4200)
	at clojure.core$ns_resolve.invoke(core.clj:4197)
	at clojure.core$resolve.invoke(core.clj:4206)
	at clojure.core$map$fn__4529.invoke(core.clj:2612)
	at clojure.lang.LazySeq.sval(LazySeq.java:40)
	at clojure.lang.LazySeq.seq(LazySeq.java:49)
	at clojure.lang.RT.seq(RT.java:485)
	at clojure.core$seq__4109.invoke(core.clj:135)
	at clojure.core$reduce1.invoke(core.clj:899)
	at clojure.core$set.invoke(core.clj:3944)
	at clojure.core$emit_defrecord.invoke(core_deftype.clj:154)
	at clojure.core$defrecord.doInvoke(core_deftype.clj:374)
	at clojure.lang.RestFn.invoke(RestFn.java:470)
	at clojure.lang.Var.invoke(Var.java:394)
	at clojure.lang.AFn.applyToHelper(AFn.java:165)
	at clojure.lang.Var.applyTo(Var.java:700)
	at clojure.lang.Compiler.macroexpand1(Compiler.java:6606)
Hide
Ghadi Shayban added a comment -

Not sure I fully understand the sad path that causes this bug, but START can safely be marked static final in the patch.

Show
Ghadi Shayban added a comment - Not sure I fully understand the sad path that causes this bug, but START can safely be marked static final in the patch.
Hide
Andy Fingerhut added a comment -

Does this seem like it may be a bug in JIT compilation to anyone? I ask because as far as we can tell, this bug occurs completely within a single thread, and as far as I've read, the Java memory model should guarantee that operations in a single thread appear to occur in the order they happen in the source code.

Independent question: It seems that the assumption is that a SeqIterator is only ever accessed from 1 thread. Is it considered an internal implementation detail, and thus on documentation is needed about this assumption?

Show
Andy Fingerhut added a comment - Does this seem like it may be a bug in JIT compilation to anyone? I ask because as far as we can tell, this bug occurs completely within a single thread, and as far as I've read, the Java memory model should guarantee that operations in a single thread appear to occur in the order they happen in the source code. Independent question: It seems that the assumption is that a SeqIterator is only ever accessed from 1 thread. Is it considered an internal implementation detail, and thus on documentation is needed about this assumption?
Hide
Alex Miller added a comment -

Re JIT - I wouldn't rule that out, but if it is, that doesn't help us make Clojure work again for everyone with existing JVMs. JCiP section 3.2.1 says "If the this reference escapes during construction, the object is considered not properly constructed." which sounds like what we're doing.

Re threading - I think that the use of iterators inside Clojure itself has (until recently) been unusual. Virtually everything is written to leverage the seq model and iterators were largely provided for Java compliance. With the creation of LazyTransformer and extension of reduce to iterators, this orientation has changed somewhat. However, I'd say that iterators are dirty stateful things and they should be consumed in localized contexts by no more than one thread at a time. If they are used in a thread-confined way and safely published between threads, SeqIterator seems ok.

Show
Alex Miller added a comment - Re JIT - I wouldn't rule that out, but if it is, that doesn't help us make Clojure work again for everyone with existing JVMs. JCiP section 3.2.1 says "If the this reference escapes during construction, the object is considered not properly constructed." which sounds like what we're doing. Re threading - I think that the use of iterators inside Clojure itself has (until recently) been unusual. Virtually everything is written to leverage the seq model and iterators were largely provided for Java compliance. With the creation of LazyTransformer and extension of reduce to iterators, this orientation has changed somewhat. However, I'd say that iterators are dirty stateful things and they should be consumed in localized contexts by no more than one thread at a time. If they are used in a thread-confined way and safely published between threads, SeqIterator seems ok.
Hide
Nicola Mometto added a comment - - edited

0001-CLJ-1636-don-t-use-this-as-a-sentinel-in-SeqIterator.patch is the same as 0001-fix-for-CLJ-1636.patch except it makes START final as suggested by Ghadi

Show
Nicola Mometto added a comment - - edited 0001-CLJ-1636-don-t-use-this-as-a-sentinel-in-SeqIterator.patch is the same as 0001-fix-for-CLJ-1636.patch except it makes START final as suggested by Ghadi
Hide
Ghadi Shayban added a comment -

Alex has screened this – which probably implies that the patch fixes the issue. Just to confirm does the patch clear up the issue for everyone else?

Show
Ghadi Shayban added a comment - Alex has screened this – which probably implies that the patch fixes the issue. Just to confirm does the patch clear up the issue for everyone else?
Hide
Michael Blume added a comment -

Does for me, yes =)

Show
Michael Blume added a comment - Does for me, yes =)
Hide
Andy Fingerhut added a comment -

If you want to collect test results, seems like it would be good for people to respond with the OS version and JVM version they tested on, and whether it was the Midje test in the description of this ticket that they tried, or some other test.

Show
Andy Fingerhut added a comment - If you want to collect test results, seems like it would be good for people to respond with the OS version and JVM version they tested on, and whether it was the Midje test in the description of this ticket that they tried, or some other test.
Hide
Andy Fingerhut added a comment -

Out of curiosity, does anyone have a smaller test case that causes this incorrect return value from the SeqIterator, without running the 'lein with-profile 1.7 midje' command on Midje? e.g. running a page or less worth of Clojure code?

Show
Andy Fingerhut added a comment - Out of curiosity, does anyone have a smaller test case that causes this incorrect return value from the SeqIterator, without running the 'lein with-profile 1.7 midje' command on Midje? e.g. running a page or less worth of Clojure code?
Hide
Alex Miller added a comment -

While we've applied the patch, I would still love to understand wtf is happening here and would love to see that too. For me, I can quite reliably reproduce it building Clojure itself. To support the theory of it happening during an inlining transition, it's unlikely to reproduce outside the context of other code however. I see it get embedded inside a big wad of calling code when I watching inlining.

Show
Alex Miller added a comment - While we've applied the patch, I would still love to understand wtf is happening here and would love to see that too. For me, I can quite reliably reproduce it building Clojure itself. To support the theory of it happening during an inlining transition, it's unlikely to reproduce outside the context of other code however. I see it get embedded inside a big wad of calling code when I watching inlining.
Hide
Nicola Mometto added a comment -

I've spent some time reading through both the jvm and the java specs and I can't find a reasonable explaination for what was happening, I can only think this is a bug in some hotspot inlining optimization.

Show
Nicola Mometto added a comment - I've spent some time reading through both the jvm and the java specs and I can't find a reasonable explaination for what was happening, I can only think this is a bug in some hotspot inlining optimization.

People

Vote (0)
Watch (5)

Dates

  • Created:
    Updated:
    Resolved: