Clojure

s/coll-of and s/every gen is very slow if :kind specified without :into

Details

  • Type: Defect Defect
  • Status: Closed Closed
  • Priority: Critical Critical
  • Resolution: Completed
  • Affects Version/s: Release 1.9
  • Fix Version/s: Release 1.9
  • Component/s: None
  • Labels:
  • Environment:
    alpha14
  • Patch:
    Code
  • Approval:
    Ok

Description

An s/coll-of with :kind but without :into takes a very long time to generate. You will mostly notice this when using stest/check but here is an example that replicates the important bits:

(require '[clojure.spec.alpha :as s] '[clojure.spec.gen.alpha :as gen])

;; Basic coll-of - fast
(time (dorun (gen/sample (s/gen (s/coll-of int?)) 1000)))
;; "Elapsed time: 49.10318 msecs"

;; coll-of with :into - fast
(time (dorun (gen/sample (s/gen (s/coll-of int? :into ())) 1000)))
;; "Elapsed time: 50.169762 msecs"

;; coll-of with :kind - SLOW
(time (dorun (gen/sample (s/gen (s/coll-of int? :kind list?)) 1000)))
;; "Elapsed time: 14922.123514 msecs"

;; coll-of with :kind and :into - fast
(time (dorun (gen/sample (s/gen (s/coll-of int? :kind list? :into ())) 1000)))
"Elapsed time: 54.188577 msecs"

Cause: If :into is not provided, s/every and s/coll-of has to generate a collection (via gen of list?) to call empty on it, to then fill it up. This is quite clearly documented in the docstring of `s/every`:

:kind - a pred/spec that the collection type must satisfy, e.g. vector?
         (default nil) Note that if :kind is specified and :into is
         not, this pred must generate in order for every to generate.

Assumedly the list? generates quite large vectors at a certain point which significantly slows down the generation. The responsible code is in gen* of every-impl.

Proposed: Use standard empty coll for persistent collection :kind preds (list? vector? set? map?).

In the patch:

  • Create lookup table `empty-coll` for kind-form (resolved pred) to empty coll.
  • When possible (if :into coll is specified or kind-form is one of these special cases), determine the empty gen-into collection once at the beginning.
  • If empty gen-into collection is available, start with that in the generator. Otherwise, fall back to prior behavior for kind.

Performance-wise, the times after the patch are comparable to times using :into.

Patch: clj-2103.patch

Activity

Hide
Leon Grapenthin added a comment -

I see two approaches to improve this behavior:

1. The gen uses the gen of kind to generate one value with the smallest size, calls empty on it to determine :into. This would lead to a surprise when your :kind is e. g. (s/or :a-vec vector? :a-list list?) (which currently throws, anyway)
2. We use an internal lookup table to assume :into. {clojure.core/vector? [], clojure.core/set? #{} ...}

Show
Leon Grapenthin added a comment - I see two approaches to improve this behavior: 1. The gen uses the gen of kind to generate one value with the smallest size, calls empty on it to determine :into. This would lead to a surprise when your :kind is e. g. (s/or :a-vec vector? :a-list list?) (which currently throws, anyway) 2. We use an internal lookup table to assume :into. {clojure.core/vector? [], clojure.core/set? #{} ...}

People

Vote (2)
Watch (3)

Dates

  • Created:
    Updated:
    Resolved: