clojure.core/set should use transients

Description

CLJ-1384-p2 uses transients for both create and createWithCheck. This is consistent with the current implementation for map.

clojure.core/vec calls (more or less) PersistentVector.create(...), which uses a transient vector to build up the result.

clojure.core/set on the other hand, calls PersistentHashSet.create(...), which repeatedly calls .cons on a PersistentHashSet, with all the associated speed/GC issues.

Operation

count

now

w/transients

set

5

1.771924 µs

1.295637 µs

into

5

1.407925 µs

1.402995 µs

set

1000000

2.499264 s

1.196653 s

into

1000000

0.977555 s

1.006951 s

Patch: CLJ-1384-p2.patch
Screened by: Stu

Environment

None

Attachments

3
  • 27 Jun 2014, 06:57 PM
  • 11 Apr 2014, 05:15 PM
  • 16 Mar 2014, 04:46 AM

Activity

Show:

gfredericks April 11, 2014 at 5:41 PM

Oh that's a good point about reduce. The difference should only apply to chunked seqs, right? It's worth noting that the benchmarks above used range which creates chunked seqs, so that might be why into looks faster on the large collections?

So this change only makes set act like vec; I think whether either/both of them should use reduce is a different question.

Ghadi Shayban April 11, 2014 at 5:35 PM

CLJ-1192 is related to this, but and Andy seems to be indicating the use of reduce as the means to better performance there.

Ambrose Bonnaire-Sergeant April 11, 2014 at 5:15 PM

Added benchmark suite (set-bench.tar).

FWIW results are similar to gfrederick's on my machine:

Clojure 1.6

Small collections (5 elements)

set averages 1.220601 µs
into averages 1.597991 µs

Large collections (1,000,000 elements)

set averages 2.429066 sec
into averages 1.006249 sec

After transients

Small collections (5 elements)

set averages 999.248325 ns
into averages 1.162889 µs

Large collections (1,000,000 elements)

set averages 1.003792 sec
into averages 889.993185 ms

Add full output to the tar.

gfredericks March 16, 2014 at 4:46 AM

Attached CLJ-1384-p1.patch, which updates the three non-check create methods.

I also added benchmarks. It's about 2-3 times faster for large collections.

gfredericks March 16, 2014 at 4:23 AM

Thanks Andy, I'll submit a patch that changes the three non-checked methods.

Completed

Details

Assignee

Reporter

Approval

Ok

Patch

Code

Priority

Affects versions

Fix versions

Created March 16, 2014 at 1:36 AM
Updated August 29, 2014 at 4:50 PM
Resolved August 29, 2014 at 4:50 PM