<< Back to previous view

[CLJ-1914] Range realization has a race during concurrent execution Created: 14/Apr/16  Updated: 19/Aug/16  Resolved: 19/Aug/16

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7, Release 1.8
Fix Version/s: Release 1.9

Type: Defect Priority: Critical
Reporter: Ghadi Shayban Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: range

Attachments: Text File clj-1914-2.patch     Text File clj-1914-3.patch     Text File CLJ-1914.patch    
Patch: Code and Test
Approval: Ok

 Description   

When a range instance is enumerated via next concurrently, some threads may not see the entire range sequence. When this happens next will return nil prematurely.

(defn enumerate [r]
 (loop [rr r
        c []]
  (let [f (first rr)]
   (if f
    (recur (next rr) (conj c f))
    c))))

(defn demo [size threads]
 (let [r (range size)
       futures (doall (repeatedly threads #(future (enumerate r))))
       res (doall (map deref futures))]
  (if (apply = res)
   (println "Passed")
   (println "Failed, counts=" (map count res)))))

(demo 300 4)
Failed, counts= (300 64 300 64)

The demo above will reliably produce a failure every few executions like the one above.

The vast majority of the time, range is used either single-threaded or in a non-competing way where this scenario will never happen. This failure only occurs when two or more threads are enumerating rapidly through the same range instance.

Cause:

Each LongRange instance acts in several capacities - as a seq, a chunked seq, and a reducible, all of which represent independent enumeration strategies (multiple of which may be used by the user). LongRange holds 2 pieces of (volatile, non-synchronized) state related to chunking: _chunk and _chunkNext. That state is only updated in forceChunk(). forceChunk() uses the "racy single-check idiom" to tolerate multiple threads racing to create the chunk. That is, multiple threads may detect that the chunk has not been set (based on null _chunk) and BOTH threads will create the next chunk and write it. But both threads have good local values, compute the same next value, and set the same next values in the fields, so the order they finish is unimportant.

The problem here is that there are actually two fields, and they are set in the order _chunk then _chunkNext. Because the guard is based on _chunk, it's possible for a thread to think the chunk values have been set but _chunkNext hasn't yet been set.

Approach:

Moving the set for _chunkNext before the set for _chunk removes that narrow window of race opportunity.

Patch: clj-1914-3.patch

Thanks to Kyle Kingsbury for the initial reproducing case https://gist.github.com/aphyr/8746181beeac6a728a3aa018804d56f6



 Comments   
Comment by Alex Miller [ 14/Apr/16 11:13 PM ]

It's not necessary to synchronize here - just swapping these two lines should address the race I think: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LongRange.java#L131-L132

Comment by Ghadi Shayban [ 14/Apr/16 11:22 PM ]

Good call. That's super subtle.

It appears all mutable assignment occurs in forceChunk() except for https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LongRange.java#L145 which should be OK (famous last words).

Comment by Stuart Halloway [ 27/Jul/16 3:27 PM ]

This ticket would benefit from an explanation of the failure mode caused by the race, plus an explanation of the subtle reasoning behind the fix.

Comment by Alex Miller [ 27/Jul/16 4:27 PM ]

Updated per request.





[CLJ-1018] range's behavior is inconsistent Created: 29/Jun/12  Updated: 24/May/13  Resolved: 24/May/13

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.1, Release 1.2, Release 1.3, Release 1.4, Release 1.5
Fix Version/s: Release 1.5, Release 1.6

Type: Defect Priority: Minor
Reporter: Devin Walters Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: patch, range

Attachments: File inconsistent_range_fix.diff    
Patch: Code and Test
Approval: Ok

 Description   

Problem statement: The current behavior of range is inconsistent. (range 0 9 0) has always produced (). (range 0 9 -1) has always produced (). (range 9 0 1) has always produced (). However, (range 9 0 0) produces (9 9 9 9 ...), and (range 0 0 0) produces '(0 0 0 0 ...)

Proposal: Make the behavior of range consistent when using a step of 0 to make it produce an empty list.

Please see attached code and patch.



 Comments   
Comment by Mike Anderson [ 01/Jul/12 4:08 AM ]

Agree it is good to fix the inconsistency, but I think an infinite sequence of zeros is the correct result, as implied by the current docstring definition.

It's also mathematically cleanest: range should produce an arithmetic progression until the end value is equalled or exceeded.

Empty lists only seem to make sense as a return value when start >= end.

Comment by Devin Walters [ 01/Jul/12 12:36 PM ]

Hi Mike,

Could you explain how you think the docstring definition implies this behavior? Maybe I'm missing something, but I was surprised. For instance, in the case of (range 3 9 0), if start were truly inclusive as the docstring says, the result would be (3), not ().

You mentioned that you think the infinite sequence of 0's is consistent and in keeping with the definition of range. I'm not sure I agree. (0,0] is an empty set of length zero, and [0,0) is an empty set of length zero.

You state that empty list only makes sense for start >= end, except this is not the current behavior. Could you help me understand what you believe the appropriate behavior would be in each of the following three cases? (range 0 10 0), (range 10 0 0), and (range 0 0 0)?

A few options to consider:
1) Fix the docstring to reflect the actual behavior of range.
2) Handle the case of (range 9 3 0) => (9 9 9 ...) to make it consistent with the behavior of (range 3 9 0) => ()
3) Stop allowing a step of zero altogether.

Editing to Add (Note: I don't think the way someone else did something is always the right way, just wanted to do some digging on how other people have handled this in the past):
http://docs.python.org/library/functions.html#range (0 step returns ValueError)
http://www2.tcl.tk/10797 (range returns empty list for a zero step)
http://www.scala-lang.org/api/2.7.7/scala/Range.html (zero step is not allowed)

Comment by Dimitrios Piliouras [ 05/Jul/12 4:13 PM ]

It does make sense NOT to allow a step of zero (at least to me)... I wasn't going to say anything about this but if other popular languages do not allow 0, then I guess it makes more sense than I originally gave myself credit for... However, if we want to be mathematically correct then the answer would be to return an infinte seq with the start value like below. After all, what is a step of 0? does it make any difference how many steps you take if with every step you cover 0 distance? obviously not...you start from x and you stay at x forever...we do have repeat for this sort of thing though...

(take 5 (range 3 9 0)) => (3 3 3 3 3)

+1 for not allowing 0 step

Comment by Mike Anderson [ 08/Jul/12 8:49 AM ]

@Devin quite simple: a lazy sequence of nums starting from x with step 0.0 until it reaches an end value of y (where y > x) is an infinite sequence of x.

Consider the case where start is 0 and end is infinity (the default), you would expect sequences to go as follows:

step +2 : 0 2 4 6 8....
step +1 : 0 1 2 3 4....
step +0 : 0 0 0 0 0....

It would be inconsistent to make 0 a special case, all of these are valid arithmetic progressions. And all of these are consistent with the current docstring.

If you make 0 a special case, then people will need to write special case code to handle it. Consider the code to create a multiplication table for example:

(for [x (range 10)]
(take 10 (range 0 Long/MAX_VALUE x)))

This works fine if you produce an infinite sequence of zeros for step 0, but fails if you create an empty list as a special case for step 0.

As a related issue, I'd personally also favour negative step sizes also producing an infinite sequence. If we don't want to allow this though, then at least the docstring should be changed to say "a lazy seq of non-decreasing nums" and a negative step should throw an error.

Comment by Devin Walters [ 09/Jul/12 7:09 PM ]

Carrying over a message from the clojure-dev list by Stuart Sierra:

  • I called the ticket a defect. Does that seem reasonable?

yes

  • Does anyone actually use the 0 step behavior in their programs?

not if they have any sense

  • Has anyone been bitten by this in the past?

not me

  • Is this behavior intentional for some historical reason I don't know about or understand?

I doubt it.

  • Has this been brought up before? I couldn't find any reference to it via Google.

Not that I know of.

  • Are there performance implications to my patch?

I doubt it. Lazy seqs are not ideal for high-performance code anyway.

  • Am I addressing a symptom rather than the problem?

I think the problem is that the result of `range` with a step of 0 was never specified. Don't assume that the tests are specifications. Many of the tests in Clojure were written by over-eager volunteers who defined the tests based on whatever the behavior happened to be. The only specification from the language designer is the docstring. By my reading, that means that `range` with a step of 0 should return an infinite seq of the start value, unless the start and end values are equal.

-S

Comment by Devin Walters [ 09/Jul/12 7:10 PM ]

Carrying over a message by Michal Marczyk:

With a negative step, the comparison is reversed, so that e.g.

(range 3 0 -1)

evaluates to

(3 2 1)

I think this is the useful behaviour; the docstring should perhaps be
adjusted to match it.

Agreed on zero step.

Cheers,
MichaƂ

Comment by Devin Walters [ 20/Jul/12 5:10 PM ]

Adding a new patch after hearing comments. This patch makes (range 9 3 0) => (9 9 9 9 ...), (range 3 9 0) => (3 3 3 3 ...), and () will be returned when (= start end). Also updated the docstring.

Comment by Timothy Baldridge [ 27/Nov/12 3:01 PM ]

Marking as vetted

Comment by Timothy Baldridge [ 27/Nov/12 3:04 PM ]

Patch applies cleanly. We've discussed this issue to death (for as simple as it is). I think it's time to mark it as screened.

Comment by Timothy Baldridge [ 27/Nov/12 3:06 PM ]

For some reason I'm not allowed to edit the attachments list. When you apply the patch, please apply inconsistent_range_fix.diff as that is the most recent version of the fix.

Comment by Rich Hickey [ 09/Dec/12 6:44 AM ]

As someone who has to read these tickets, I'd really appreciate it if you could keep the description up to date and accurate, with examples of the current and intended behavior and the strategy to be used in fixing. I can't follow every thread to see what the latest thinking is, especially on a patch like this where the original mission was changed.

Thanks

Comment by Devin Walters [ 10/Dec/12 3:53 PM ]

@Tim: I've removed the other attachments.

@Rich: Understood. I will update the description this evening.





Generated at Sun Dec 11 04:08:33 CST 2016 using JIRA 4.4#649-r158309.