Clojure

clojure.core/set should use transients

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: Release 1.5, Release 1.6
  • Fix Version/s: Release 1.7
  • Component/s: None
  • Labels:
  • Patch:
    Code
  • Approval:
    Screened

Description

CLJ-1384-p2 uses transients for both create and createWithCheck. This is consistent with the current implementation for map.

clojure.core/vec calls (more or less) PersistentVector.create(...), which uses a transient vector to build up the result.

clojure.core/set on the other hand, calls PersistentHashSet.create(...), which repeatedly calls .cons on a PersistentHashSet, with all the associated speed/GC issues.

Operation count now w/transients
set 5 1.771924 µs 1.295637 µs
into 5 1.407925 µs 1.402995 µs
set 1000000 2.499264 s 1.196653 s
into 1000000 0.977555 s 1.006951 s

Patch: CLJ-1384-p2.patch
Screened by: Stu

  1. CLJ-1384-p1.patch
    15/Mar/14 10:46 PM
    2 kB
    Gary Fredericks
  2. CLJ-1384-p2.patch
    27/Jun/14 12:57 PM
    4 kB
    Stuart Halloway
  3. set-bench.tar
    11/Apr/14 11:15 AM
    6 kB
    Ambrose Bonnaire-Sergeant

Activity

Hide
Gary Fredericks added a comment -

Oh that's a good point about reduce. The difference should only apply to chunked seqs, right? It's worth noting that the benchmarks above used range which creates chunked seqs, so that might be why into looks faster on the large collections?

So this change only makes set act like vec; I think whether either/both of them should use reduce is a different question.

Show
Gary Fredericks added a comment - Oh that's a good point about reduce. The difference should only apply to chunked seqs, right? It's worth noting that the benchmarks above used range which creates chunked seqs, so that might be why into looks faster on the large collections? So this change only makes set act like vec; I think whether either/both of them should use reduce is a different question.
Hide
Ghadi Shayban added a comment -

CLJ-1192 is related to this, but and Andy seems to be indicating the use of reduce as the means to better performance there.

Show
Ghadi Shayban added a comment - CLJ-1192 is related to this, but and Andy seems to be indicating the use of reduce as the means to better performance there.
Hide
Ambrose Bonnaire-Sergeant added a comment - - edited

Added benchmark suite (set-bench.tar).

FWIW results are similar to gfrederick's on my machine:

Clojure 1.6

Small collections (5 elements)

set averages 1.220601 µs
into averages 1.597991 µs

Large collections (1,000,000 elements)

set averages 2.429066 sec
into averages 1.006249 sec

After transients

Small collections (5 elements)

set averages 999.248325 ns
into averages 1.162889 µs

Large collections (1,000,000 elements)

set averages 1.003792 sec
into averages 889.993185 ms

Add full output to the tar.

Show
Ambrose Bonnaire-Sergeant added a comment - - edited Added benchmark suite (set-bench.tar). FWIW results are similar to gfrederick's on my machine:

Clojure 1.6

Small collections (5 elements)

set averages 1.220601 µs into averages 1.597991 µs

Large collections (1,000,000 elements)

set averages 2.429066 sec into averages 1.006249 sec

After transients

Small collections (5 elements)

set averages 999.248325 ns into averages 1.162889 µs

Large collections (1,000,000 elements)

set averages 1.003792 sec into averages 889.993185 ms Add full output to the tar.
Hide
Gary Fredericks added a comment - - edited

Attached CLJ-1384-p1.patch, which updates the three non-check create methods.

I also added benchmarks. It's about 2-3 times faster for large collections.

Show
Gary Fredericks added a comment - - edited Attached CLJ-1384-p1.patch, which updates the three non-check create methods. I also added benchmarks. It's about 2-3 times faster for large collections.
Hide
Gary Fredericks added a comment -

Thanks Andy, I'll submit a patch that changes the three non-checked methods.

Show
Gary Fredericks added a comment - Thanks Andy, I'll submit a patch that changes the three non-checked methods.
Hide
Andy Fingerhut added a comment -

I believe that the 'with check' versions are only intended to be used when reading set literals in Clojure source code, and give an error if there are duplicate elements. If you find examples where those set creation functions are called in other situations, I would be interested to learn about them to find out where my misunderstanding lies, or whether that is a problem with the current code.

If the belief above is correct, I would suggest not changing the 'with check' versions, since their speed isn't as critical.

Show
Andy Fingerhut added a comment - I believe that the 'with check' versions are only intended to be used when reading set literals in Clojure source code, and give an error if there are duplicate elements. If you find examples where those set creation functions are called in other situations, I would be interested to learn about them to find out where my misunderstanding lies, or whether that is a problem with the current code. If the belief above is correct, I would suggest not changing the 'with check' versions, since their speed isn't as critical.
Hide
Gary Fredericks added a comment -

PersistentHashSet has six methods for creating sets – one for each combination of {with check, without check} and {array (varargs), List, ISeq}. Each of them does not use transients but could.

I believe clojure.core/set only depends on the (without check, ISeq) version.

Should all of these be changed? Three of them? One of them?

Show
Gary Fredericks added a comment - PersistentHashSet has six methods for creating sets – one for each combination of {with check, without check} and {array (varargs), List, ISeq}. Each of them does not use transients but could. I believe clojure.core/set only depends on the (without check, ISeq) version. Should all of these be changed? Three of them? One of them?

People

Vote (4)
Watch (3)

Dates

  • Created:
    Updated: