<< Back to previous view

[CLJ-1499] Replace seq-based iterators with direct iterators for all non-seq collections that use SeqIterator Created: 08/Aug/14  Updated: 23/Jan/15

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Enhancement Priority: Major
Reporter: Rich Hickey Assignee: Fogus
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File clj-1499-all.diff     Text File clj-1499-v10.patch     Text File clj-1499-v11.patch     File clj-1499-v2.diff     File clj-1499-v3.diff     File clj-1499-v6.diff     File clj-1499-v7.diff     File clj-1499-v8.diff     Text File clj-1499-v9.patch     Text File defrecord-iterator.patch     File defrecord-iterator-v2.diff    
Patch: Code and Test
Approval: Vetted

 Description   

Add support for direct iterators instead of seq-based iterators for non-seq collections that use SeqIterator.

Patch adds support for direct iterators on the following (removing use of SeqIterator):

  • PersistentHashMap - new internal iterator (~20% faster)
  • APersistentSet - use internal map impl iterator (~5% faster)
  • PersistentQueue
  • PersistentStructMap
  • records (in core_deftype.clj)

Patch does not change use of SeqIterator in:

  • LazyTransformer$MultiStepper (not sure if this could be changed)
  • ASeq
  • LazySeq

In preparation for use in CLJ-1602, there is also a new IMapIterable interface that allows a map to publish direct Iterators of keys and values (without creating and pulling apart MapEntry objs). PersistentHashMap and PersistentArrayMap implement it right now and PersistentHashSet will check for and use the direct key iterator if possible.

Performance:

I re-ran the set of tests from CLJ-1546 for vec improvements which use these new iterators for PersistentArrayMap, PersistentHashMap, and PersistentHashSet. These test the time to run vec on a PHS or PHM of the given size. All timings in nanoseconds.

coll size 1.7.0-alpha4 1.7.0-alpha5 alpha5 + patch alpha5 + patch / alpha5
PHS 4 236 406 204 0.86
PHS 16 857 896 382 0.45
PHS 64 3637 3832 1723 0.47
PHS 256 14235 14672 6525 0.46
PHS 1024 67743 68857 44654 0.66
PHM 4 112 232 232 2.07
PHM 16 550 832 442 0.80
PHM 64 3014 3155 2014 0.67
PHM 256 11661 12082 7522 0.65
PHM 1024 57388 59787 44240 0.77

I also ran a set of tests just on reduce with reduce-kv for comparison (reduce-kv uses a different unaffected reduction path so it's a good baseline for comparison).

expr a-set / a-map size 1.6.0 1.7.0-alpha5 (ns) 1.7.0-alpha5 + patch (ns) patch / alpha5 patch / 1.6
(reduce + 0 a-set) 4 171 241 114 0.47 0.67
(reduce + 0 a-set) 16 663 826 395 0.48 0.60
(reduce + 0 a-set) 64 3637 4456 2361 0.53 0.65
(reduce + 0 a-set) 256 14772 16158 8872 0.55 0.60
(reduce + 0 a-set) 1024 72548 79010 49626 0.63 0.68
(reduce + 0 a-set) 16384 1197962 1234286 729335 0.59 0.61
(reduce #(+ %1 (val %2)) 0 a-map) 4 69 103 104 1.01 1.51
(reduce #(+ %1 (val %2)) 0 a-map) 16 640 752 506 0.67 0.79
(reduce #(+ %1 (val %2)) 0 a-map) 64 2861 3238 2388 0.74 0.83
(reduce #(+ %1 (val %2)) 0 a-map) 256 11998 12848 9526 0.74 0.79
(reduce #(+ %1 (val %2)) 0 a-map) 1024 59650 61913 54717 0.88 0.92
(reduce #(+ %1 (val %2)) 0 a-map) 16384 982317 969168 837842 0.86 0.85
(reducekv #(%1 %3) 0 a-map) 4 58 62 60 0.97 1.03
(reducekv #(%1 %3) 0 a-map) 16 233 250 248 0.99 1.06
(reducekv #(%1 %3) 0 a-map) 64 1477 1525 1442 0.95 0.98
(reducekv #(%1 %3) 0 a-map) 256 5470 5258 5201 0.99 0.95
(reducekv #(%1 %3) 0 a-map) 1024 27344 27442 27410 1.00 1.00
(reducekv #(%1 %3) 0 a-map) 16384 399591 397090 396267 1.00 0.99

Patch: clj-1499-v11.patch

Screened by:



 Comments   
Comment by Alex Miller [ 13/Aug/14 1:57 PM ]

The list of non-seqs that uses SeqIterator are:

  • records (in core_deftype.clj)
  • APersistentSet - fallback, maybe is ok?
  • PersistentHashMap
  • PersistentQueue
  • PersistentStructMap

Seqs (that do not need to be changed) are:

  • ASeq
  • LazySeq.java

LazyTransformer$MultiStepper - not sure

Comment by Ghadi Shayban [ 27/Sep/14 2:16 PM ]

attached iterator impl for defrecords. ready to leverage iteration for extmap when PHM iteration lands.

Comment by Alex Miller [ 29/Sep/14 12:52 PM ]

PHM patch

Comment by Alex Miller [ 29/Sep/14 10:26 PM ]

New patch that fixes bugs with PHMs with null keys (and added tests to expose those issues), added support for PHS.

Comment by Ghadi Shayban [ 29/Sep/14 10:45 PM ]

Alex, the defrecord patch already uses the iterator for extmap. It's just made better by the PHM patch.

Comment by Alex Miller [ 29/Sep/14 10:47 PM ]

Comment by Ghadi Shayban [ 30/Sep/14 4:17 PM ]

Heh. Skate to where the puck is going to be – Gretzky

Re: defrecord iterator: As is, it propagates exceptions from reaching the end of the ExtMap's iterator. As noted in CLJ-1453, PersistentArrayMap's iterator improperly returns an ArrayIndexOutOfBoundsException, rather than NoSuchElementException.

Comment by Alex Miller [ 30/Sep/14 6:41 PM ]

Hey Ghadi, rather than rebuilding the case map to pass to the RecordIterator, why don't you just pass the fields in iteration order to it and leverage the case map via .valAt like everything else?

Comment by Ghadi Shayban [ 30/Sep/14 7:30 PM ]

defrecord-iterator-v2.diff reuses valAt and minimizes macrology.

Comment by Alex Miller [ 07/Oct/14 2:04 PM ]

Comments from Stu (found under the couch):

"1. some of the impls (e.g. queue manually concatenate two iters. Would implementing iter-cat and calling that be simpler and more robust?
2. I found this tweak to the generative testing more useful in reporting failure, non-dependent on clojure.test, and capable of expecting failures. Waddya think?

(defn seq-iter-match
  [seqable iterable]
  (let [i (.iterator iterable)]
    (loop [s (seq seqable)
           n 0]
      (if (seq s)
        (do
          (when-not (.hasNext i)
            (throw (ex-info "Iterator exhausted before seq"
                            {:pos n :seqable seqable :iterable iterable})))
          (when-not (= (.next i) (first s))
            (throw (ex-info "Iterator and seq did not match"
                            {:pos n :seqable seqable :iterable iterable})))
          (recur (rest s) (inc n)))
        (when (.hasNext i)
          (throw (ex-info "Seq exhausted before iterator"
                          {:pos n :seqable seqable :iterable iterable})))))))

		(defspec seq-and-iter-match-for-maps
  identity
  [^{:tag clojure.test-clojure.data-structures/gen-map} m]
  (seq-iter-match m m))

3. similar generative approach would be good for the other types (looks like we just do maps)"

Comment by Alex Miller [ 02/Dec/14 3:03 PM ]

Latest patch (clj-1499-v6.diff) makes Stu's suggested change #2 above and adds tests recommended in #3. I looked at #1 but decided in the end that it wasn't going to make anything easier.

Comment by Alex Miller [ 09/Jan/15 11:11 AM ]

v7 patch just updates to current master

Comment by Alex Miller [ 14/Jan/15 4:08 PM ]

found a bug in last impl in maps with null values...fixing

Comment by Stuart Halloway [ 16/Jan/15 1:40 PM ]

v11 updates v10 to apply cleanly on master





[CLJ-976] Implement reader literal and print support for PersistentQueue data structure Created: 27/Apr/12  Updated: 28/Dec/14

Status: Open
Project: Clojure
Component/s: None
Affects Version/s: Release 1.5
Fix Version/s: None

Type: Enhancement Priority: Minor
Reporter: Fogus Assignee: Fogus
Resolution: Unresolved Votes: 4
Labels: data-structures, queue, reader, tagged-literals

Attachments: File CLJ-976-queue-literal-eval-and-synquote.diff     Text File clj-976-queue-literal-eval-and-synquote-patch-v3.txt     File CLJ-976-queue-literal-eval.diff    
Patch: Code and Test

 Description   

Clojure's PersistentQueue structure has been in the language for quite some time now and has found its way into a fair share of codebases. However, the creation of queues is a two step operation often of the form:

(conj clojure.lang.PersistentQueue/EMPTY :a :b :c)

;=> #<PersistentQueue clojure.lang.PersistentQueue@78d5f6bc>

A better experience might be the following:

#queue [:a :b :c]

;=> #queue [:a :b :c]

(pop #queue [:a :b :c])

;=> #queue [:b :c]

This syntax is proposed and discussed in the Clojure-dev group at https://groups.google.com/forum/?fromgroups#!topic/clojure-dev/GQqus5Wycno

Open question: Should the queue literal's arguments eval? The implications of this are illustrated below:

;; non-eval case
#queue [1 2 (+ 1 2)]

;=> #queue [1 2 (+ 1 2)]


;; eval case
#queue [1 2 (+ 1 2)]

;=> #queue [1 2 3]

The answer to this open question will determine the implementation.



 Comments   
Comment by Steve Miner [ 27/Apr/12 10:18 AM ]

I think the non-eval behavior would be consistent with the other reader literals in Clojure 1.4. It's definitely better for interop where some other language implementation could be expected to handle a few literal representations, but not the evaluation of Clojure expressions. Use a regular function if the args need evaluation.

Comment by Chas Emerick [ 27/Apr/12 10:19 AM ]

The precedent of records seems relevant:

=> (defrecord A [b])
user.A
=> #user.A[(+ 4 5)]
#user.A{:b (+ 4 5)}
=> #user.A{:b (+ 4 5)}
#user.A{:b (+ 4 5)}

This continues to make sense, as otherwise queues would need to print with an extra (quote …) form around lists — which records neatly avoid:

=> (A. '(+ 4 5))
#user.A{:b (+ 4 5)}

Does this mean that a queue fn (analogous to vector, maybe) will also make an appearance? It'd be handy for HOF usage.

Comment by Fogus [ 27/Apr/12 11:00 AM ]

Added a patch for the tagged literal support ONLY. This is only one part of the total solution. This provides the read-string and printing capability. I'd like more discussion around the eval side before I get dive into the compiler.

Comment by Paul Michael Bauer [ 27/Apr/12 6:45 PM ]

In addition to Chas' observations on consistency with record literals, would eval in queue literals open up the same security hole as #=, needing to respect *read-eval*?
When needing eval inside a queue literal, embedding a #= seems more apropos.

Comment by Fogus [ 04/May/12 1:14 PM ]

Evalable queue literal support.

Comment by Andy Fingerhut [ 10/May/12 5:54 PM ]

Neither of the patches CLJ-976-queue-literal-tagged-parse-support-only.diff dated Apr 27, 2012 nor CLJ-976-queue-literal-eval.diff dated May 4, 2012 apply cleanly to latest master as of May 10, 2012.

Comment by Fogus [ 11/May/12 10:15 AM ]

Updated patch file to merge with latest master.

Comment by Fogus [ 20/Jul/12 1:14 PM ]

New patch with support fixed for syntax-quote.

Comment by Stuart Sierra [ 17/Aug/12 12:41 PM ]

Patch does not apply as of commit f5f4faf95051f794c9bfa0315e4457b600c84cef

Comment by Fogus [ 17/Aug/12 3:06 PM ]

Weird. I was able to download the CLJ-976-queue-literal-eval-and-synquote.diff patch and apply it to HEAD as of just now (f5f4faf95051f794c9bfa0315e4457b600c84cef). There were whitespace warnings, but the patch applied, compiles and passes all tests.

Comment by Andy Fingerhut [ 17/Aug/12 7:29 PM ]

With latest head I was able to successfully apply patch CLJ-976-queue-literal-eval-and-synquote.diff with this command:

git am --keep-cr -s < CLJ-976-queue-literal-eval-and-synquote.diff

with some warnings, but successfully applied. If I try it without the --keep-cr option, the patch fails to apply. I believe this is often a sign that either one of the files being patched, or the patch itself, contains CR/LF line endings, which some of the Clojure source files definitely do.

The command above (with --keep-cr) is currently the one recommended for applying patches on page http://dev.clojure.org/display/design/JIRA+workflow in section "Screening Tickets". I added the suggested --keep-cr option after running across another patch that applied with the option, but not without it.

Comment by Andy Fingerhut [ 28/Aug/12 5:45 PM ]

Presumptuously changing Approval from Incomplete back to Test, since the latest patch does apply cleanly if --keep-cr option is used.

Comment by Rich Hickey [ 08/Sep/12 6:48 AM ]

this needs more time

Comment by Fogus [ 18/Sep/12 8:15 AM ]

Rich,

Do you mind providing a little more detail? I would be happy to make any changes if needed. However, if it's just a matter of its relationship to EDN and/or waiting until the next release then I am happy to wait. In either case, I'd like to complete this or push it to the back of my mind. Thanks.

Comment by Andy Fingerhut [ 05/Oct/12 7:49 AM ]

clj-976-queue-literal-eval-and-synquote-patch-v2.txt dated Oct 5 2012 is identical to Fogus's patch CLJ-976-queue-literal-eval-and-synquote.diff dated Jul 20 2012. It simply removes one line addition to clojure.iml that Rich has since added in a different commit, so that this patch now applies cleanly to latest master.

Comment by Andy Fingerhut [ 16/Oct/12 12:20 PM ]

clj-976-queue-literal-eval-and-synquote-patch-v3.txt dated oct 16 2012 is identical to Fogus's patch CLJ-976-queue-literal-eval-and-synquote.diff dated Jul 20 2012. It simply removes one line addition to clojure.iml that Rich has since added in a different commit, so that this patch now applies cleanly to latest master.

Comment by Andy Fingerhut [ 20/Oct/12 12:26 PM ]

Fogus, with the recent commit of a patch for CLJ-1070, my touched-up patch clj-976-queue-literal-eval-and-synquote-patch-v3.txt dated Oct 16 2012 doesn't apply cleanly. In this case it isn't simply a few lines of context that have changed, it is the interfaces that PersistentQueue implements have been changed. It might be best if you take a look at the latest code and the patch and consider how it should be updated.

Comment by Steve Miner [ 06/Apr/13 8:07 AM ]

Related to CLJ-1078.

Comment by Alex Miller [ 22/Aug/13 10:38 PM ]

Moving back to Triaged as Rich has not vetted.





Generated at Fri Jan 30 09:35:19 CST 2015 using JIRA 4.4#649-r158309.