Clojure

Reduce broken on some primitive vectors

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: Release 1.3, Release 1.4, Release 1.5, Release 1.6
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

In some cases, reduce over a sequence from a primitive vector created with vector-of will return incorrect answers:

user=> (into [] (drop 32 (into [] (range 33))))
[32]
user=> (into [] (drop 32 (into (vector-of :int) (range 33))))
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32]

Second call should return [32] just like the first one.

Cause: VecSeq (seq on primitive Vec obtained with vector-of) maintains two flags: i is the total number of elements prior to the current node in this seq. offset is the offset in the current anode. When using internal-reduce on a VecSeq, the starting index for the reduce was using offset and ignoring i.

Solution: Use (+ i offset) as the starting index.

Patch: clj-1362-v1.patch

Screened by:

Activity

Alex Miller made changes -
Field Original Value New Value
Labels bug collections
Alex Miller made changes -
Summary vector-of, split-at, and into Primitive vectors and split-at leads to incorrect result
Alex Miller made changes -
Description I'm seeing some really weird behavior under certain conditions when using vector-of vectors when they are split and then the second half is put into another vector.

The following test illustrates the problem:

{code}
(ns vector-of-test.core-test
  (:use clojure.test
        vector-of-test.core))

(defn verify [expected actual]
  (doseq [i (range (count expected))]
    (testing (str "i=" i)
      (is (= (apply concat (split-at i expected))
             (apply concat (split-at i actual))))
      (is (= (second (split-at i expected))
             (second (split-at i actual))))
      (is (= (into [] (second (split-at i expected)))
             (into [] (second (split-at i actual))))))))

(deftest test-vector-of
  (doseq [j (range 35)]
    (testing (str "j=" j)
      (let [test-seq (range j)]
        (verify (into [] test-seq)
                (into (vector-of :int) test-seq))))))
{code}

It appears that for 31 < i < j, the test will fail to pass verify's third check. Basically, the result derived from the (vector-of :int) sequence contains the entire original sequence instead of just the last few items. Oddly, the first two checks succeed -- it's only when you try to put it into something that the problem manifests.. Also, it doesn't seem to matter what kind of vector you put the results in (I'm using a PersistentVector here, but I see the same results with a (vector-of :int), for example).
I'm seeing some really weird behavior under certain conditions when using vector-of vectors when they are split and then the second half is put into another vector.

Simpler test than original report below:

{code}
user=> (into [] (drop 32 (into [] (range 33))))
[32]
user=> (into [] (drop 32 (into (vector-of :int) (range 33))))
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32]
{code}

Second call should return [32] as well.

The following test illustrates the problem:

{code}
(ns vector-of-test.core-test
  (:use clojure.test
        vector-of-test.core))

(defn verify [expected actual]
  (doseq [i (range (count expected))]
    (testing (str "i=" i)
      (is (= (apply concat (split-at i expected))
             (apply concat (split-at i actual))))
      (is (= (second (split-at i expected))
             (second (split-at i actual))))
      (is (= (into [] (second (split-at i expected)))
             (into [] (second (split-at i actual))))))))

(deftest test-vector-of
  (doseq [j (range 35)]
    (testing (str "j=" j)
      (let [test-seq (range j)]
        (verify (into [] test-seq)
                (into (vector-of :int) test-seq))))))
{code}

It appears that for 31 < i < j, the test will fail to pass verify's third check. Basically, the result derived from the (vector-of :int) sequence contains the entire original sequence instead of just the last few items. Oddly, the first two checks succeed -- it's only when you try to put it into something that the problem manifests.. Also, it doesn't seem to matter what kind of vector you put the results in (I'm using a PersistentVector here, but I see the same results with a (vector-of :int), for example).
Hide
Alex Miller added a comment -

We did some debugging on this at the St. Louis Clojure Meetup tonight and suspect the problem is happening when drop walks through the chunked seq over the vector. Specifically, in the VecSeq's implementation of IChunkedSeq.chunkedNext() at https://github.com/clojure/clojure/blob/master/src/clj/clojure/gvec.clj#L116 particularly the offset 0 at the end.

Show
Alex Miller added a comment - We did some debugging on this at the St. Louis Clojure Meetup tonight and suspect the problem is happening when drop walks through the chunked seq over the vector. Specifically, in the VecSeq's implementation of IChunkedSeq.chunkedNext() at https://github.com/clojure/clojure/blob/master/src/clj/clojure/gvec.clj#L116 particularly the offset 0 at the end.
Alex Miller made changes -
Summary Primitive vectors and split-at leads to incorrect result Reduce broken on some primitive vectors
Hide
Alex Miller added a comment -

Upon further review, the VecSeq seems to be created properly during chunking. The real issue is in internal-reduce where the starting index is improperly computed.

Show
Alex Miller added a comment - Upon further review, the VecSeq seems to be created properly during chunking. The real issue is in internal-reduce where the starting index is improperly computed.
Alex Miller made changes -
Patch Code and Test [ 10002 ]
Approval Triaged [ 10120 ]
Description I'm seeing some really weird behavior under certain conditions when using vector-of vectors when they are split and then the second half is put into another vector.

Simpler test than original report below:

{code}
user=> (into [] (drop 32 (into [] (range 33))))
[32]
user=> (into [] (drop 32 (into (vector-of :int) (range 33))))
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32]
{code}

Second call should return [32] as well.

The following test illustrates the problem:

{code}
(ns vector-of-test.core-test
  (:use clojure.test
        vector-of-test.core))

(defn verify [expected actual]
  (doseq [i (range (count expected))]
    (testing (str "i=" i)
      (is (= (apply concat (split-at i expected))
             (apply concat (split-at i actual))))
      (is (= (second (split-at i expected))
             (second (split-at i actual))))
      (is (= (into [] (second (split-at i expected)))
             (into [] (second (split-at i actual))))))))

(deftest test-vector-of
  (doseq [j (range 35)]
    (testing (str "j=" j)
      (let [test-seq (range j)]
        (verify (into [] test-seq)
                (into (vector-of :int) test-seq))))))
{code}

It appears that for 31 < i < j, the test will fail to pass verify's third check. Basically, the result derived from the (vector-of :int) sequence contains the entire original sequence instead of just the last few items. Oddly, the first two checks succeed -- it's only when you try to put it into something that the problem manifests.. Also, it doesn't seem to matter what kind of vector you put the results in (I'm using a PersistentVector here, but I see the same results with a (vector-of :int), for example).
In some cases, reduce over a sequence from a primitive vector created with vector-of will return incorrect answers:

{code}
user=> (into [] (drop 32 (into [] (range 33))))
[32]
user=> (into [] (drop 32 (into (vector-of :int) (range 33))))
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32]
{code}

Second call should return [32] just like the first one.

*Cause:* VecSeq (seq on primitive Vec obtained with vector-of) maintains two flags: {{i}} is the total number of elements prior to the current node in this seq. {{offset}} is the offset in the current anode. When using internal-reduce on a VecSeq, the starting index for the reduce was using offset and ignoring i.

*Solution:* Use (+ i offset) as the starting index.

*Patch:* clj-1362-v1.patch

*Screened by:*

The following test illustrates the problem:

{code}
(ns vector-of-test.core-test
  (:use clojure.test
        vector-of-test.core))

(defn verify [expected actual]
  (doseq [i (range (count expected))]
    (testing (str "i=" i)
      (is (= (apply concat (split-at i expected))
             (apply concat (split-at i actual))))
      (is (= (second (split-at i expected))
             (second (split-at i actual))))
      (is (= (into [] (second (split-at i expected)))
             (into [] (second (split-at i actual))))))))

(deftest test-vector-of
  (doseq [j (range 35)]
    (testing (str "j=" j)
      (let [test-seq (range j)]
        (verify (into [] test-seq)
                (into (vector-of :int) test-seq))))))
{code}

It appears that for 31 < i < j, the test will fail to pass verify's third check. Basically, the result derived from the (vector-of :int) sequence contains the entire original sequence instead of just the last few items. Oddly, the first two checks succeed -- it's only when you try to put it into something that the problem manifests.. Also, it doesn't seem to matter what kind of vector you put the results in (I'm using a PersistentVector here, but I see the same results with a (vector-of :int), for example).
Attachment clj-1362-v1.patch [ 12829 ]
Alex Miller made changes -
Description In some cases, reduce over a sequence from a primitive vector created with vector-of will return incorrect answers:

{code}
user=> (into [] (drop 32 (into [] (range 33))))
[32]
user=> (into [] (drop 32 (into (vector-of :int) (range 33))))
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32]
{code}

Second call should return [32] just like the first one.

*Cause:* VecSeq (seq on primitive Vec obtained with vector-of) maintains two flags: {{i}} is the total number of elements prior to the current node in this seq. {{offset}} is the offset in the current anode. When using internal-reduce on a VecSeq, the starting index for the reduce was using offset and ignoring i.

*Solution:* Use (+ i offset) as the starting index.

*Patch:* clj-1362-v1.patch

*Screened by:*

The following test illustrates the problem:

{code}
(ns vector-of-test.core-test
  (:use clojure.test
        vector-of-test.core))

(defn verify [expected actual]
  (doseq [i (range (count expected))]
    (testing (str "i=" i)
      (is (= (apply concat (split-at i expected))
             (apply concat (split-at i actual))))
      (is (= (second (split-at i expected))
             (second (split-at i actual))))
      (is (= (into [] (second (split-at i expected)))
             (into [] (second (split-at i actual))))))))

(deftest test-vector-of
  (doseq [j (range 35)]
    (testing (str "j=" j)
      (let [test-seq (range j)]
        (verify (into [] test-seq)
                (into (vector-of :int) test-seq))))))
{code}

It appears that for 31 < i < j, the test will fail to pass verify's third check. Basically, the result derived from the (vector-of :int) sequence contains the entire original sequence instead of just the last few items. Oddly, the first two checks succeed -- it's only when you try to put it into something that the problem manifests.. Also, it doesn't seem to matter what kind of vector you put the results in (I'm using a PersistentVector here, but I see the same results with a (vector-of :int), for example).
In some cases, reduce over a sequence from a primitive vector created with vector-of will return incorrect answers:

{code}
user=> (into [] (drop 32 (into [] (range 33))))
[32]
user=> (into [] (drop 32 (into (vector-of :int) (range 33))))
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32]
{code}

Second call should return [32] just like the first one.

*Cause:* VecSeq (seq on primitive Vec obtained with vector-of) maintains two flags: {{i}} is the total number of elements prior to the current node in this seq. {{offset}} is the offset in the current anode. When using internal-reduce on a VecSeq, the starting index for the reduce was using offset and ignoring i.

*Solution:* Use (+ i offset) as the starting index.

*Patch:* clj-1362-v1.patch

*Screened by:*
Rich Hickey made changes -
Fix Version/s Release 1.7 [ 10250 ]
Approval Triaged [ 10120 ] Vetted [ 10003 ]
Hide
Stuart Sierra added a comment -

Screened.

Show
Stuart Sierra added a comment - Screened.
Stuart Sierra made changes -
Approval Vetted [ 10003 ] Screened [ 10004 ]
Rich Hickey made changes -
Approval Screened [ 10004 ] Ok [ 10007 ]
Stuart Halloway made changes -
Resolution Completed [ 1 ]
Status Open [ 1 ] Closed [ 6 ]

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: