Clojure

Enhance the performance of map merges

Details

  • Type: Enhancement Enhancement
  • Status: Open Open
  • Priority: Critical Critical
  • Resolution: Unresolved
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
  • Patch:
    Code and Test
  • Approval:
    Triaged

Description

It would be nice if merge used transients.

Patch

  • clj-1458-7.patch

Approach
Migrate c.c/merge later in core after transients & reduce. Leave older version as merge1 for use in cases the precede the newer definition. Make APersistentMap/conj & ATransientMap/cons aware of IKVReduce.

The attached patch preserves two existing behaviors of merge

  • metadata propagation
  • the right hand side of the merges can be a Map.Entry, an IPersistentVector where size=2, and regular maps.

Screened by:

  1. 0001-very-simple-test-of-the-merge-function.patch
    29/Dec/14 11:17 PM
    1.0 kB
    Michael Blume
  2. clj-1458-4.patch
    22/Jun/15 10:11 AM
    9 kB
    Alex Miller
  3. CLJ-1458-5.patch
    09/Jan/16 5:09 PM
    14 kB
    Ghadi Shayban
  4. CLJ-1458-6.patch
    09/Jan/16 9:50 PM
    20 kB
    Ghadi Shayban
  5. CLJ-1458-7.patch
    28/Sep/16 1:55 PM
    11 kB
    Michael Blume
  6. CLJ-1458-transient-merge.patch
    28/Dec/14 10:07 PM
    8 kB
    Ghadi Shayban
  7. CLJ-1458-transient-merge2.patch
    29/Dec/14 10:37 PM
    9 kB
    Ghadi Shayban
  8. CLJ-1458-transient-merge3.patch
    29/Dec/14 10:48 PM
    9 kB
    Ghadi Shayban
  9. merge-test-2.patch
    29/Dec/14 11:22 PM
    1 kB
    Michael Blume
  10. transient-merge.diff
    13/Sep/14 5:42 PM
    4 kB
    Jason Wolfe

Activity

Hide
Jason Wolfe added a comment -

I will take a crack at a patch today.

Show
Jason Wolfe added a comment - I will take a crack at a patch today.
Hide
Jason Wolfe added a comment - - edited

This patch (transient-merge.diff) makes merge, merge-with, and zipmap (since it was right there and could obviously benefit from transients as well) use transients.

Three potential issues:

  • I had to move the functions, since they depend on transient and friends. I assume this is preferable to a forward declaration. This was the best place I could find, but happy to move them elsewhere.
  • I added multiple arities, to avoid potential performance cost of transient-ing a single argument. Happy to undo this if desired.
  • I had to slightly alter the logic in merge-with, since transient maps don't support contains? (or find).
Show
Jason Wolfe added a comment - - edited This patch (transient-merge.diff) makes merge, merge-with, and zipmap (since it was right there and could obviously benefit from transients as well) use transients. Three potential issues:
  • I had to move the functions, since they depend on transient and friends. I assume this is preferable to a forward declaration. This was the best place I could find, but happy to move them elsewhere.
  • I added multiple arities, to avoid potential performance cost of transient-ing a single argument. Happy to undo this if desired.
  • I had to slightly alter the logic in merge-with, since transient maps don't support contains? (or find).
Hide
Michał Marczyk added a comment - - edited

I posted a separate ticket for zipmap, with patch, on 30/May/12: CLJ-1005.

Show
Michał Marczyk added a comment - - edited I posted a separate ticket for zipmap, with patch, on 30/May/12: CLJ-1005.
Hide
Jason Wolfe added a comment -

Ah, sorry if I overstepped then. Happy to remove that change from this patch then if that will simplify things – just let me know.

Show
Jason Wolfe added a comment - Ah, sorry if I overstepped then. Happy to remove that change from this patch then if that will simplify things – just let me know.
Hide
Ghadi Shayban added a comment -

alternate approach attached delaying merge until after protocols load, and then using transducers.

Show
Ghadi Shayban added a comment - alternate approach attached delaying merge until after protocols load, and then using transducers.
Hide
Michael Blume added a comment -

Looks like you're doing (get m k) twice – shouldn't that be thrown in a local?

Show
Michael Blume added a comment - Looks like you're doing (get m k) twice – shouldn't that be thrown in a local?
Hide
Michael Blume added a comment -

um, put, in a local, I mean, 'throw' was a bad choice of word.

Show
Michael Blume added a comment - um, put, in a local, I mean, 'throw' was a bad choice of word.
Hide
Ghadi Shayban added a comment -

Yeah there's that – won't be using get anyways after CLJ-700 gets committed.

We should add performance tests too. merging two maps, three, many maps, also varying the sizes of the maps, and for merge-with, varying the % of collisions.

Need to go back to the (some identity) logic, otherwise metadata is propagated from maps other than the first provided. I'll fix later.

Show
Ghadi Shayban added a comment - Yeah there's that – won't be using get anyways after CLJ-700 gets committed. We should add performance tests too. merging two maps, three, many maps, also varying the sizes of the maps, and for merge-with, varying the % of collisions. Need to go back to the (some identity) logic, otherwise metadata is propagated from maps other than the first provided. I'll fix later.
Hide
Michael Blume added a comment -

I don't know if this is supposed to be allowed, but this breaks

(merge {} [:foo 'bar])

which is used in the wild by compojure-api

Show
Michael Blume added a comment - I don't know if this is supposed to be allowed, but this breaks (merge {} [:foo 'bar]) which is used in the wild by compojure-api
Hide
Michael Blume added a comment -

Ghadi, contains? uses get under the covers, so it's still two gets, right? It seems like it'd be more performant to stick with the ::none trick.

Show
Michael Blume added a comment - Ghadi, contains? uses get under the covers, so it's still two gets, right? It seems like it'd be more performant to stick with the ::none trick.
Hide
Nicola Mometto added a comment -

This calls for if-let + find.

Show
Nicola Mometto added a comment - This calls for if-let + find.
Hide
Ghadi Shayban added a comment -

new patch addressing concerns so far

Show
Ghadi Shayban added a comment - new patch addressing concerns so far
Hide
Ghadi Shayban added a comment -

CLJ-1458-transient-merge3.patch removes silly inlining macro, uses singleton fns instead.

Show
Ghadi Shayban added a comment - CLJ-1458-transient-merge3.patch removes silly inlining macro, uses singleton fns instead.
Hide
Michael Blume added a comment -

Nice =)

This should come with tests. If we want to preserve the ability to merge with a MapEntry, we should test it. This isn't so much a weakness of the patch as of the existing tests. I see merge and merge-with being used a few times in the test suite, but I see no test whose purpose is to test their behavior.

Show
Michael Blume added a comment - Nice =) This should come with tests. If we want to preserve the ability to merge with a MapEntry, we should test it. This isn't so much a weakness of the patch as of the existing tests. I see merge and merge-with being used a few times in the test suite, but I see no test whose purpose is to test their behavior.
Hide
Michael Blume added a comment -

Extremely simple merge test, we need more than this, but this is a start

Show
Michael Blume added a comment - Extremely simple merge test, we need more than this, but this is a start
Hide
Alex Miller added a comment -

clj-1458-4.patch refreshed to apply to master, no changes.

Show
Alex Miller added a comment - clj-1458-4.patch refreshed to apply to master, no changes.
Hide
Ghadi Shayban added a comment -

I'd like to reevaluate the scope of this ticket. Can we address 'merge' only and defer 'merge-with'? It's by far the more common function. I've attached a new simplified patch.

Show
Ghadi Shayban added a comment - I'd like to reevaluate the scope of this ticket. Can we address 'merge' only and defer 'merge-with'? It's by far the more common function. I've attached a new simplified patch.
Hide
Ghadi Shayban added a comment -

CLJ-1458-6.patch is yet another cleaner approach

Show
Ghadi Shayban added a comment - CLJ-1458-6.patch is yet another cleaner approach
Hide
Alex Miller added a comment -

Can you update the ticket approach section to discuss the APersistentMap.cons / ASSOC changes. Also, can you add a before / after perf test for one or more common cases?

Show
Alex Miller added a comment - Can you update the ticket approach section to discuss the APersistentMap.cons / ASSOC changes. Also, can you add a before / after perf test for one or more common cases?
Hide
Michael Blume added a comment -

Updated patch to handle use of merge in core_print before it's defined in core

Show
Michael Blume added a comment - Updated patch to handle use of merge in core_print before it's defined in core
Hide
Ghadi Shayban added a comment -

If anyone wants to take stewardship of this, go ahead. I had trouble getting consistent performance improvements on this. Obviously this needs fresh benchmarks.

Show
Ghadi Shayban added a comment - If anyone wants to take stewardship of this, go ahead. I had trouble getting consistent performance improvements on this. Obviously this needs fresh benchmarks.
Hide
Alex Miller added a comment -

Yes, this needs a benchmark showing demonstrable improvement. The whole goal here is improved perf - if we can't prove it's consistently faster, then there is no point in even reviewing it.

Show
Alex Miller added a comment - Yes, this needs a benchmark showing demonstrable improvement. The whole goal here is improved perf - if we can't prove it's consistently faster, then there is no point in even reviewing it.

People

Vote (8)
Watch (5)

Dates

  • Created:
    Updated: