<< Back to previous view

[CLJ-1511] stack overflow when comparing sequence results Created: 24/Aug/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Resolved
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Critical
Reporter: Chhi'mèd Künzang Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: transducers
Environment:

OS X 10.9.4


Attachments: Text File 0001-provide-working-implementations-for-LazyTransform-eq.patch    
Patch: Code and Test
Approval: Ok

 Description   

Comparing sequences created with sequence causes a stack overflow when used as first argument to =.

Consider this transducer:

user=> (def map-inc (map inc))
#'user/map-inc

When creating a sequence and comparing with expected results, it works fine as the second argument to the comparison:

user=> (= (range 1 11) (sequence map-inc (range 10)))
true

But a stack overflow occurs when the order of arguments is reversed:

user=> (= (sequence map-inc (range 10)) (range 1 11))

StackOverflowError   clojure.lang.LazyTransformer.equals (LazyTransformer.java:202)
user=> (clojure.stacktrace/print-stack-trace *e 10)
java.lang.StackOverflowError: null
 at clojure.lang.LazyTransformer.equiv (LazyTransformer.java:185)
    clojure.lang.LazyTransformer.equals (LazyTransformer.java:202)
    clojure.lang.LazyTransformer.equiv (LazyTransformer.java:185)
    clojure.lang.LazyTransformer.equals (LazyTransformer.java:202)
    clojure.lang.LazyTransformer.equiv (LazyTransformer.java:185)
    clojure.lang.LazyTransformer.equals (LazyTransformer.java:202)
    clojure.lang.LazyTransformer.equiv (LazyTransformer.java:185)
    clojure.lang.LazyTransformer.equals (LazyTransformer.java:202)
    clojure.lang.LazyTransformer.equiv (LazyTransformer.java:185)
    clojure.lang.LazyTransformer.equals (LazyTransformer.java:202)
nil

The error persists, even if the sequence is forced with doall:

user=> (= (doall (sequence map-inc (range 10))) (doall (range 1 11)))

StackOverflowError   clojure.lang.LazyTransformer.equiv (LazyTransformer.java:185)

It does work as expected, however, if the sequence is converted to a vector:

user=> (= (vec (sequence map-inc (range 10))) (range 1 11))
true


 Comments   
Comment by Nicola Mometto [ 25/Aug/14 4:31 AM ]

Patch provides equiv/equals implementations for LazyTransform based on ASeq equiv/equals





[CLJ-1497] sequence with transducers realizes n+2 elements Created: 08/Aug/14  Updated: 08/Aug/14  Resolved: 08/Aug/14

Status: Closed
Project: Clojure
Component/s: None
Affects Version/s: Release 1.7
Fix Version/s: Release 1.7

Type: Defect Priority: Major
Reporter: Nicola Mometto Assignee: Unassigned
Resolution: Completed Votes: 0
Labels: None

Attachments: File clj-1497.diff     File clj-1497v2.diff    
Approval: Ok

 Description   

The first element is realized at creation time:

user=> (def a (sequence (take 2) (map #(do (println (str "~" %)) %) (iterate inc 1))))
~1
#'user/a

Fully realizing the sequence realizes the other n-1 elements, and 2 more:

user=> a
(~2
~3
1 ~4
2)

Compare with version using seq operations:

user=> (def a (sequence (take 2 (map #(do (println (str "~" %)) %) (iterate inc 1)))))
#'user/a
user=> a
(~1
~2
1 2)

Transduce also doesn't seem to exhibit this issue:

user=> (def a (transduce (take 2) conj [] (map #(do (println (str "~" %)) %) (iterate inc 1))))
~1
~2
#'user/a
user=> a
[1 2]


 Comments   
Comment by Alex Miller [ 08/Aug/14 10:02 AM ]

Patch attached that improves the issue - will now only realize n+1 elements.

Comment by Nicola Mometto [ 08/Aug/14 10:16 AM ]

Nice, I added a commit on top of yours to delay the realization of the first element of the lazyseq to the first .next call instead of on SeqIteration creation

Comment by Alex Miller [ 08/Aug/14 11:12 AM ]

Fixed by Rich directly, not by patch.





[CLJ-1439] Reduce keyword cache lookup cost Created: 05/Jun/14  Updated: 01/Aug/14  Resolved: 01/Aug/14

Status: Resolved
Project: Clojure
Component/s: None
Affects Version/s: Release 1.6
Fix Version/s: Release 1.7

Type: Enhancement Priority: Minor
Reporter: Kyle Kingsbury Assignee: Unassigned
Resolution: Completed Votes: 1
Labels: keywords, performance

Attachments: Text File 0001-Improve-Keyword.intern-performance.patch    
Patch: Code
Approval: Ok

 Description   

Background: Symbol is composed of name and namespace strings. Symbol construction interns both of these strings - this reduces memory usage and allows for string == checks inside Symbol. Keywords wrap a Symbol and have an additional cache to reuse Keyword instances.

Problem: Certain applications make heavy use of keywords (in particular the case of parsing or transforming JSON, XML, or other data into Clojure maps with keyword keys). Constructing the same keyword from a string over and over again will cause the string to be interned, a symbol constructed, and the lookup to occur in the keyword cache. In the case where the keyword already exists, this is more work than is necessary, making this path slower than it can be.

Reproduce: The following test simulates rounds of creating many keywords - the unique? flag indicates whether to use new or the same set of keywords each rep. unique?=false should be more similar to parsing a similar JSON record format over and over.

(set! *unchecked-math* true)

(defn kw-new [n unique?]
  (let [base (if unique? (str (rand)) "abcdef")]
    (loop [i 0
           kws (transient [])]
      (if (< i n)
        (recur (inc i) (conj! kws (keyword (str base i))))
        (persistent! kws)))))

(defn bench-kw [reps n unique?]
  (dotimes [_ reps]
    (let [begin (System/nanoTime)]
        (kw-new n unique?)
        (let [end (System/nanoTime)
              elapsed (/ (- end begin) 1000000.0)]
          (println elapsed "ms")))))

(bench-kw 50 10000 false)  ;; expected similar to JSON use case
(bench-kw 50 10000 true)   ;; for comparison

On 1.6, we see about 5.5 ms for repeated and 134 ms for unique after warmup.
With the patch, we see about 2.2 ms for repeated and 120 ms for unique after warmup.

Cause: Keyword construction based on a string involves:

  • Interning string(s) in new kw
  • Constructing Symbol with interned strings
  • Clearing Keywords from the Keyword cache if GC has reclaimed them
  • Constructing a new Keyword
  • Wrapping the Keyword in a WeakReference
  • CHM putIfAbsent on the cache
  • If new, return. If exists, get the old one and return.
  • In the event the Keyword is reclaimed by GC between the last 2 steps, retry.

This process involves a fair amount of speculative interning and object creation if the keyword already exist.

Proposal: Streamline the keyword construction process by reworking the cache implementation and the Keyword.intern() process. The patch changes the cache to key by string name instead of symbol, deferring interning and symbol creation on lookup to when we know the keyword construction is needed. The various Keyword.intern() methods are also reworked to take advantage if called with an existing Symbol to avoid re-creating it.

Patch: 0001-Improve-Keyword.intern-performance.patch

Related: CLJ-1415



 Comments   
Comment by Alex Miller [ 01/Aug/14 11:48 AM ]

Alternate changes were committed today to improve both symbol and keyword creation times.

https://github.com/clojure/clojure/commit/c9e70649d2652baf13b498c4c3ebb070118c4573





Generated at Wed Aug 27 16:03:31 CDT 2014 using JIRA 4.4#649-r158309.