Clojure

Subvecs of primitive vectors cannot be reduced

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Completed
  • Affects Version/s: None
  • Fix Version/s: Release 1.6
  • Component/s: None
  • Labels:
  • Environment:
    1.7.0-08, OS X 10.8
  • Patch:
    Code and Test
  • Approval:
    Ok

Description

Reduce doesn't work on subvecs of primitive vectors.

(let [prim-vec (into (vector-of :long) (range 10000))]
  (reduce + (subvec prim-vec 1 500)))

->> ClassCastException clojure.core.Vec cannot be cast to clojure.lang.PersistentVector  clojure.lang.APersistentVector$SubVector.iterator (APersistentVector.java:523)

If reduce on a subvec doesn't work then neither will nice ops like fold.

Cause: RT.subvec() creates an APersistentVector$SubVector. SubVector.iterator() assumes the source vector is a PersistentVector, however a primitive vector is a Vec (which is not a PersistentVector). This causes a class cast exception as observed on any attempt to iterate over the subvector.

Approach:
1. Provide a generic ranged iterator for APersistentVector, that can be used by SubVector
2. Make the iterator() method for APersistentVector$SubVector use this new iterator where possible (i.e. whenever the source vector is an APersistentVector). If not, the generic super.iterator() method is used (which is slower, but safe for any source vector that implements IPersistentVector)

Patch: clj-1082-patch-v3.diff

Screened by: Alex Miller

  1. clj-1082.patch
    20/Jan/13 6:42 PM
    4 kB
    Mike Anderson
  2. clj-1082-patch-v2.diff
    22/Oct/13 9:08 AM
    4 kB
    Alex Miller
  3. clj-1082-patch-v2.txt
    05/Sep/13 6:12 PM
    4 kB
    Andy Fingerhut
  4. clj-1082-patch-v3.diff
    24/Nov/13 2:15 PM
    3 kB
    Alex Miller

Activity

Hide
Alex Miller added a comment -

Added new v3 patch that a) combines the previous commits into a single patch and b) removes endIndex and uses end instead. As far as I know this + the description change address all of Rich's questions. Marking re-screened.

Show
Alex Miller added a comment - Added new v3 patch that a) combines the previous commits into a single patch and b) removes endIndex and uses end instead. As far as I know this + the description change address all of Rich's questions. Marking re-screened.
Hide
Alex Miller added a comment -

Updated description per Mike's comment.

Show
Alex Miller added a comment - Updated description per Mike's comment.
Hide
Mike Anderson added a comment -

I don't seem to have the ability to edit the description. Here's what I think it should say:

Cause: RT.subvec() creates an APersistentVector$SubVector. SubVector.iterator() assumes the source vector is a PersistentVector, however a primitive vector is a Vec (which is not a PersistentVector). This causes a class cast exception as observed on any attempt to iterate over the subvector.

Approach:
1. Provide a generic ranged iterator for APersistentVector, that can be used by SubVector
2. Make the iterator() method for APersistentVector$SubVector use this new iterator where possible (i.e. whenever the source vector is an APersistentVector). If not, the generic super.iterator() method is used (which is slower, but safe for any source vector that implements IPersistentVector)

Show
Mike Anderson added a comment - I don't seem to have the ability to edit the description. Here's what I think it should say: Cause: RT.subvec() creates an APersistentVector$SubVector. SubVector.iterator() assumes the source vector is a PersistentVector, however a primitive vector is a Vec (which is not a PersistentVector). This causes a class cast exception as observed on any attempt to iterate over the subvector. Approach: 1. Provide a generic ranged iterator for APersistentVector, that can be used by SubVector 2. Make the iterator() method for APersistentVector$SubVector use this new iterator where possible (i.e. whenever the source vector is an APersistentVector). If not, the generic super.iterator() method is used (which is slower, but safe for any source vector that implements IPersistentVector)
Hide
Mike Anderson added a comment -

Hi Rich,
1. As per comments, Andy made a small change to the original patch. v2 supersedes the original patch.
2. endIndex is part of the iterator implementation: I believe this must be mutable to provide the required Java Iterator behaviour
3. I think the approach is misworded (it was added long after the patch), I shall try to improve this.

Show
Mike Anderson added a comment - Hi Rich, 1. As per comments, Andy made a small change to the original patch. v2 supersedes the original patch. 2. endIndex is part of the iterator implementation: I believe this must be mutable to provide the required Java Iterator behaviour 3. I think the approach is misworded (it was added long after the patch), I shall try to improve this.
Hide
Rich Hickey added a comment -

This diff remains inscrutable. It seems to have two patches in it, one a first idea and another a modification to that? Patches should be direct enhancements to the trunk code. Also, what is endIndex for and why is it mutable? Why not just use end? And, the code doesn't agree with the plan, which says "Check the vector type and if it is an APersistentVector, use the existing logic. Otherwise, fallback to a new rangedIterator() implementation in APersistentVector that iterates using nth." while the code seems to do the opposite:

+ if (v instanceof APersistentVector) { + return ((APersistentVector)v).rangedIterator(start,end); + }
+ return super.iterator();

Show
Rich Hickey added a comment - This diff remains inscrutable. It seems to have two patches in it, one a first idea and another a modification to that? Patches should be direct enhancements to the trunk code. Also, what is endIndex for and why is it mutable? Why not just use end? And, the code doesn't agree with the plan, which says "Check the vector type and if it is an APersistentVector, use the existing logic. Otherwise, fallback to a new rangedIterator() implementation in APersistentVector that iterates using nth." while the code seems to do the opposite: + if (v instanceof APersistentVector) { + return ((APersistentVector)v).rangedIterator(start,end); + } + return super.iterator();
Hide
Andy Fingerhut added a comment -

clj-1082-patch-v2.txt is identical to Mike Anderson's clj-1082.patch, preserving his authorship, except it eliminates a carriage return added at the end of one line, which also causes git to issue a warning when applying the patch.

Show
Andy Fingerhut added a comment - clj-1082-patch-v2.txt is identical to Mike Anderson's clj-1082.patch, preserving his authorship, except it eliminates a carriage return added at the end of one line, which also causes git to issue a warning when applying the patch.
Hide
Alex Miller added a comment -

Moving back to Triaged as Rich has not vetted.

Show
Alex Miller added a comment - Moving back to Triaged as Rich has not vetted.
Hide
Andy Fingerhut added a comment -

Mike, your patch clj-1082.patch applies cleanly to latest master for me, so looks like you found one way to do it.

Another would be as follows, and closer to the directions on the JIRA workflow page: http://dev.clojure.org/display/design/JIRA+workflow (but not identical). Note that these commands would work on Mac OS X or Linux. I'm not sure what the correct corresponding command would be on Windows for the "git am" step below, unless that just happens to work because Windows and/or git implement the input redirection with "<" somehow.

  1. Check out a fresh repo
    $ git clone git://github.com/clojure/clojure.git
    $ cd clojure
  1. Apply the patch for CLJ-1076 to the master branch. This step isn't on the JIRA workflow page.
    $ git am --keep-cr -s < clj-1076-fix-tests-on-windows-patch-v1.txt
  1. Create a branch for yourself
    $ git checkout -b fix-clj-1082
  1. After editing to make your changes, commit them to the current fix-clj-1082 branch
    $ git commit -a -m "fixed annoying bug, refs #42"

From there on down it is the same as the instructions on the JIRA workflow page. The "git format-patch master --stdout > file.patch" will create a patch for the changes you have made in the current branch fix-clj-1082 starting from the master branch, which has the CLJ-1076 fix because of the 'git am' command above.

Show
Andy Fingerhut added a comment - Mike, your patch clj-1082.patch applies cleanly to latest master for me, so looks like you found one way to do it. Another would be as follows, and closer to the directions on the JIRA workflow page: http://dev.clojure.org/display/design/JIRA+workflow (but not identical). Note that these commands would work on Mac OS X or Linux. I'm not sure what the correct corresponding command would be on Windows for the "git am" step below, unless that just happens to work because Windows and/or git implement the input redirection with "<" somehow.
  1. Check out a fresh repo $ git clone git://github.com/clojure/clojure.git $ cd clojure
  1. Apply the patch for CLJ-1076 to the master branch. This step isn't on the JIRA workflow page. $ git am --keep-cr -s < clj-1076-fix-tests-on-windows-patch-v1.txt
  1. Create a branch for yourself $ git checkout -b fix-clj-1082
  1. After editing to make your changes, commit them to the current fix-clj-1082 branch $ git commit -a -m "fixed annoying bug, refs #42"
From there on down it is the same as the instructions on the JIRA workflow page. The "git format-patch master --stdout > file.patch" will create a patch for the changes you have made in the current branch fix-clj-1082 starting from the master branch, which has the CLJ-1076 fix because of the 'git am' command above.
Hide
Mike Anderson added a comment -

Patch for CLJ-1082, containing 3 commits

Show
Mike Anderson added a comment - Patch for CLJ-1082, containing 3 commits
Hide
Mike Anderson added a comment -

Attached a patch that I created with:

git format-patch winfix --stdout HEAD~3..HEAD > clj-1082.patch

Does this do the trick? I had to use the HEAD~3..HEAD to just get the most recent 3 commits and exclude the pprint changes that I needed in order to build on Windows.

Show
Mike Anderson added a comment - Attached a patch that I created with: git format-patch winfix --stdout HEAD~3..HEAD > clj-1082.patch Does this do the trick? I had to use the HEAD~3..HEAD to just get the most recent 3 commits and exclude the pprint changes that I needed in order to build on Windows.
Hide
Mike Anderson added a comment -

I have a fix for this and a regression test in the tree below:

https://github.com/mikera/clojure/tree/winfix

Not sure how best to make this into a patch though - I also have the pprint fix in here (CLJ-1076)

Show
Mike Anderson added a comment - I have a fix for this and a regression test in the tree below: https://github.com/mikera/clojure/tree/winfix Not sure how best to make this into a patch though - I also have the pprint fix in here (CLJ-1076)
Hide
Andy Fingerhut added a comment -

There is no code or ticket for this yet, but I wanted to mention that I've begun working on an implementation of RRB-Tree (Relaxed Radix Balanced Tree) vectors for Clojure (see discussion at https://groups.google.com/forum/?fromgroups=#!topic/clojure-dev/xnbtzTVEK9A). Assuming it is no big deal to get reducers to work on such vectors, subvec could be implemented in O(log n) time in such a way that the result was the same concrete type of vector as you started with, and thus reducers would work on them, too.

It would mean O(log n) time for subvec instead of today's O(1), but this would likely fix other limitations that exist today with subvec's, e.g. CLJ-761.

Show
Andy Fingerhut added a comment - There is no code or ticket for this yet, but I wanted to mention that I've begun working on an implementation of RRB-Tree (Relaxed Radix Balanced Tree) vectors for Clojure (see discussion at https://groups.google.com/forum/?fromgroups=#!topic/clojure-dev/xnbtzTVEK9A). Assuming it is no big deal to get reducers to work on such vectors, subvec could be implemented in O(log n) time in such a way that the result was the same concrete type of vector as you started with, and thus reducers would work on them, too. It would mean O(log n) time for subvec instead of today's O(1), but this would likely fix other limitations that exist today with subvec's, e.g. CLJ-761.
Hide
Timothy Baldridge added a comment -

Confirmed to be broken on master. Vetting. Not sure what it's going to take to fix this, however. If this is of intrest for you, you might want to help push it along by providing a patch.

Show
Timothy Baldridge added a comment - Confirmed to be broken on master. Vetting. Not sure what it's going to take to fix this, however. If this is of intrest for you, you might want to help push it along by providing a patch.

People

Vote (0)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: